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ABSTRACT 


This  report  presents  the  findings  of  the  first  phase  of  a  research 
project  to  investigate  the  problems  which  exist  regarding  aubcuHural 
differ ?nces  in  the  prediction  of  job  performance.  Phase  I  of  the  project 
was  an  attempt  to  obtain  an  adequate  picture  of  the  effects  of  cultural 
factors  on  existing  selection  procedures.  Seven  independent  studies  were 
conducted  in  which  the  validity  of  commercial  and  industrially  developed 
selection  tests  was  examined  separately  for  white  and  Negro  subgroups  of 
the  population  using  the  eleven  different  relationships  presented  in  the 
Bartlett  and  O’Leary  (1969)  model.  Occupational  groups  which  were  studied 
included  toll  collectors,  correctional  officers,  toll  facility  officers, 
various  clerical  workers,  and  keypunch  operators.  A  unmple  of  Inmates  in 
a  federal  correctional  institution  was  alao  studied. 

The  results  of  Phase  I  indicated  that  test  bias  is  clearly  present 
in  a  large  number  of  cases  where  heterogeneous  groups  are  combined  in 
making  predictions  of  Job  performance.  However,  it  is  erroneous  to 
conclude  that  all  inadvertent  test  bias  denies  opportunities  to  minority 
group  members.  The  present  studv  has  demonstrated  the  need  to  validate 
tests  separately  for  minority  and  majority  group  members.  The  traditional 
validation  model  which  assumes  homogeneous  populations  la  clearly  inappro¬ 
priate. 

The  second  phase  of  the  project  will  involve  the  evaluation  of 
procedures  to  control  or  eliminate  bias.  Differential  ^.'“diction  models, 
culture-equivalent  tests,  learning  measures,  as  well  as  some  non-cognitive 


measures  will  be  examined. 
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INTRODUCTION 


Equal  opportunity  for  minority  group  members  In  industrial  and  educational 
institutions  has  become  an  area  of  national  concern.  Both  professionals  and 
laymen  have  claimed  that  many  of  the  current  methods  of  assessing  abilities 
may  systematically  deny  opportunities  for  minority  groups. 

Although  there  is  considerable  agreement  that  a  problem  exists  regarding 
subcultural  differences  in  the  prediction  of  Job  performance  (see  APA  Task 
Force  on  Employment  Testing  of  Minority  Groups,  1969),  there  is  a  need  to 
learn  more  about  the  nature  of  the  problem.  Bartlett  and  Cleary  ( L969)  have 
developed  a  model  which  demonstrates  possible  relationships  which  may  exist 
when  heterogeneous  groups  are  combined  in  making  predictions.  Viewing  thi3 
model  in  terms  of  subcultural  bias,  It  becomes  apparent  that  there  are  a 
number  of  different  situations  where  inadvertent  test  discrimination  may  be 
found.  More  important,  however,  is  the  realization  that  solutions  to  the 
problem  of  test  bias  are  dependent  upon  the  nature  of  the  existing  relation¬ 
ship  between  the  tests  and  the  criterion. 

No  single  technique,  such  as  culture-free  tests  or  test-taking  training, 
will  solve  all  problems,  but  each  may  be  useful  in  certain  situations.  How¬ 
ever,  ’'ntil  a  basic  parametric  study  is  conducted  to  determine  the  nature  of 
the  problem,  haphazard  applications  of  the  various  techniques  which  have  been 
suggested  as  solutions  may  lead  to  the  elimination  of  some  potentially  useful 
techniques.  For  example,  one  may  be  using  test-taking  training  to  eliminate 
unfair  discrimination  in  situations  which  call  for  differential  prediction, 
as  in  the  example  where  one  test  has  positive  validity  for  one  subgroup  and 
negative  validity  for  another. 

Guion  (19 C6)  has  alluded  to  the  need  for  a  basic  parametric  study, 
stating  that  there  is  no  evidence  now  available  to  indicate  which  models  will 
be  most  useful  for  eliminating  unfair  discrimination  in  testing.  The  present 
project  was  a  response  to  this  need. 

Fhase  I  of  this  two  part  project,  essentially  exploratory  in  nature,  was 
an  attempt  to  obtain  an  adequate  picture  of  the  effects  of  cultural  factors 
on  existing  selection  procedures.  More  specifically,  an  attempt  was  made  to 


determine  the  frequency  of  occurrence  of  the  eleven  different  relationships 
presented  in  the  Bartlett  and  O'Leary  ( 19^9 )  model,  as  well  as  how  pervasive 
these  relationships  are  across  a  number  of  different  types  of  tests  and 
criteria.  Phase  II  activities,  currently  in  progress,  are  directed  toward 
the  development  and  experimental  evaluation  of  procedures  to  control  or 
eliminate  test  bias. 

The  present  technical  report  describes  the  results  of  Phase  I  research 
efforts.  Over  30  different  organizations  were  contacted  in  an  effort  to 
obtain  test  validation  data.^  Bata  were  obtained  Prom  approximately  20  per¬ 
cent  of  those  contacted.  Many  of  the  organizations  contacted  did  not  have 
enough  minority  group  members  in  similar  Job  classifications  to  obtain  a 
separate  validation  sample.  In  addition,  many  agencies  were  reluctant  to 
release  data  because  of  the  controversial  nature  of  the  topic. 

Test  validation  research  for  minority  groups  presents  a  number  of  unique 
methodological  problems.  First,  since  often  only  a  few  minority  group  members 
are  employed  in  a  specific  job  classification,  it  is  virtually  impossible  to 
divide  the  groups  for  purposes  of  cross-validation.  Secondly,  because  of  the 
rather  large  differential  in  sample  size,  validity  coeff icients  of  equal  mag¬ 
nitude  are  often  not  statistically  significant  for  the  minority  sample  but 
significant  for  the  white  sample. 

The  Bartlett-0’ Leary  model,  which  was  being  cvaluuted  in  this  investiga¬ 
tion,  assumes  that  subgroup  differences  on  the  criterion  measures  are  a  function 
of  actual  differences  in  job  performance.  Although  a  few  of  the  studies 
reported  contain  objective  criteria,  the  most  frequently  UGed  criterion  was 
supervisory  ratings  of  job  performance.  In  most  of  the  studios,  meetings  were 
held  with  supervisors  to  familiarize  them  with  the  rating  scales  and  to  strosG 
the  experimental  nature  of  the  ratings.  Moreover,  racial  identi ideation  was 
obtained  for  each  employee  after  the  ratings  had  been  collected.  Despite  these 
precautionary  steps,  no  estimate  was  available  concerning  the  nature  and  extent 
of  bias  affecting  these  ratings  for  the  two  racial  groups. 
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Academic  and  governmental  institutions,  as  well  as,  industrial 


organizations  were  contacted. 


Section  I:  Historical  Background 


In  recent  years  there  has  been  an  increasing  awareness  of  the  need  for 
socially  responsible  behavior  on  the  part  of  all  kinds  of  organizations.  The 
passage  of  the  Civil  Rights  Acts  of  1964  has  made  the  issue  of  discrimination 
in  personnel  selection  a  logal  as  \  11  as  a  moral  one.  In  particular,  doubts 
have  been  raised  about  psychological  tests  used  in  personnel  selection  (Amrine, 
1965 ) •  These  tests  have  come  under  attack  on  many  fronts  for  alleged  bias 
against  minority  and  culturally  disadvantaged  groups.  The  purpose  of  this 
investigation  is  to  determine  if  this  bias  actually  does  exist  by  examining  the 
relation  between  selection  tests  and  job  performance  in  a  variety  of  occupa¬ 
tional  groups  in  which  both  majority  und  minority  group  members  are  employed. 

Concepts  of  Bias 

The  definition  of  test  bias  used  in  the  present  study  was  that  of  Cleary 
(1966,  p.l):  "A  test  is  biased  for  members  of  a  subgroup  of  the  population  if, 
in  the  prediction  of  a  criterion  for  which  the  test  was  designed,  consistent 
nonzero  errors  of  prediction  are  made  for  members  of  the  subgroup.  In  other 
words,  the  test  is  biased  if  too  high  or  too  low  a  criterion  score  is  consis¬ 
tently  predicted  for  members  of  the  subgroup  when  the  common  regression  line 
is  used. " 

This  definition  of  test  bias  has  several  implications.  First,  a  test,  in 
and  of  itself,  is  not  discriminatory.  The  use  to  which  a  test  is  put.  however, 
can  be  discriminatory  (Tenopyr,  1967)*  Unless  an  outside  criterion  is  applied, 
a  significant  difference  in  mean  test  scores  for  different  cultural  or  ethnic 
groups  cannot  be  presumed  to  be  bias  against  one  or  more  of  the  subgroups.  It 
is  certainly  not  unreasonable  to  assume  that  the  test  is  measuring  a  true  dif¬ 
ference  between  subgroups  on  the  test  dimension  or  dimensions.  Thus,  to  label 
a  test  as  discriminatory  solely  on  the  basis  of  difference  in  test  performance 
between  the  different  subgroups  indicates  a  misunderstand  ing  or  a  Jel'mition  of 
concept  of  test  bias  that  differn  from  that  used  in  the  present  investigation. 

It  should  always  be  remembered  that  the  purpose  of  a  selection  test  is  to 
differentiate  between  those  job  applicants  who  will  be  good  performers  on  the 
job  and  those  who  will  be  poor  (Guion,  1966).  Only  if  an  outside  criterion,  a 
measure  of  job  performance,  is  applied  can  one  determine  whether  a  given 
selection  test  is  biased  or  unbiased  with  respect  to  the  different  subgroups 


comprising  the  applicant  popula-  n.  If  differences  in  the  test  performance  of 
two  groups  are  associated  with  group  differences  in  the  same  direction  on  a  job 
performance  measure,  then  the  test  is  doing  its  job;  i.e.,  it  is  differentiating 
between  good  and  poor  performers  on  the  job.  A  test  in  this  particular  situation 
is  unbiased  with  respect  to  different  groups  within  the  job  applicant  population 
(Arvey,  1967 ) •  However,  if  group  test  performance  differences  are  not  associated 
with  group  differences  in  job  performance  or  are  associated  with  group  differences 
in  the  opposite  direction  on  the  performance  criterion,  then  the  test  is  dis¬ 
criminating  in  an  unfair  manner  and  can  properly  be  labeled  is  biased. 

Aside  from  the  legal  aspects  of  test  bias  in  selection  procedures,  the 
existence  of  such  bias  will  usually  result  in  a  selection  procedure  which  over- 
or  under-predicts  the  job  performance  of  certain  subgroup  members.  Thus,  the 
elimination  of  test  bias  is  desired  because  it  will  increase  the  practical 
efficiency  of  the  selection  procedure  In  screening  out  those  job  applicants  who 
will  not  be  successful  on  the  job  and  in  accepting  those  job  applicants  who  will 
be  successful. 

Bias  Reduction 

Several  alternatives  for  the  elimination  of  test  bias  are  possible.  First, 
psychological  tests  could  be  eliminated  from  the  selection  procedure.  However, 
this  alternative  would  perhaps  lead  to  increased  discrimination  in  the  selection 
procedure  because  such  devices  as  the  interview  and  application  blanks  used  in 
place  of  tests  may  be  even  more  subject  to  bias.  These  are  potentially  more 
discriminating  in  an  unfair  manner  than  tests  and,  with  these  less  sophisticated 
measures  bias  may  be  even  more  difficult  to  detect  or  eliminate.  If  alternative 
predictors  which  can  be  demonstrated  to  be  superior  to  tests  and  free  from  bias 
are  developed,  then  tests  may  be  replaced  by  those  measures  in  the  selection 
procedure . 

A  second  alternative  is  the  development  of  "culture-free"  tests.  Krug 
(1966)  states  thata  truly  culture-free  test  must  meet  one  of  two  conditions: 
a)  all  people  of  all  cultures  must  have  had  equal  oppprtunity  and  equal  motive 
to  learn  all  items  on  the  test,  or  b)  all  items  possess  complete  novelty  for 
all  people  of  all  cultures.  It  is  extremely  unlikely  that  any  test  will  ever 
be  constructed  so  as  to  t"-  i  either  of  these  conditions.  More  promising  are 
several  variants  of  c.  re-free  tests,  specifically  culture-fair  and  culture- 
equivalent  tests.  The  assumption  of  a  culture-fair  tost  is  that  there  exists 
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a  set  of  test  stimuli  which  are  equally  appropriate  to  at  least  two  cultural 
groups.  In  a  culture-equivalent  test,  cultural  counterparts  of  various  test 
items  are  developed  (Krug,  19 66). 

However,  until  the  various  subcultures  within  the  major  culture  are  fully 
investigated  and  criteria  established  as  to  what  denotes  cultural  "fairness" 
or  "equivalence"  for  these  subcultures,  it  Is  doubtful  that  meaningful  contri¬ 
butions  to  the  problem  of  test  bias  will  be  made  with  this  approach  ( Lockwood, 
1966).  Guion  (1966)  stated  that  culture-free  tests  might  be  useful  us  an 
indication  of  the  degree  of  cultural  deprivation  of  an  individual.  He  proposed 
to  do  this  by  comparing  test  scores  on  a  traditional  measure  of  intelligence 
and  on  a  culture-free  test.  The  difference  between  the  scores  (expressed  in 
standard  score  units)  would  be  a  measure  of  the  cultural  deprivation. 

Tenopyr  (1967)  stated  that  the  evidence  suggests  that  the  Negro  job  appli¬ 
cant  may  be  at  a  greater  disadvantage  when  so-called  "culture-fair  spatial  tests 
are  used  in  selection  than  when  verbal  tests  are  utilized.  Kirkpatrick,  Ewen, 
Barrett  and  Katzell  ( 1967 )  found  that  non-verbal  predictors  were  in  general  not 
valid  for  the  prediction  cf  job  performance  of  Negro  female  clerical  workers 
although  they  were  valid  for  white  female  clerical  workers.  The  evidence  seems 
to  indicate  that,  although  culture-free  tests  or  their  variants  may  be  useful 
in  some  situations  or  as  supplementary  instruments,  they  cannot  be  viewed  os  a 
panacea  for  all  problems  associated  with  personnel  selection  from  culturally 
heterogeneous  job  applicant  populations. 

A  third,  perhaps  more  promising,  approach  to  the  elimination  of  tost  bias 
is  to  investigate  the  relationship  of  the  predictor  and  criterion  measures 
separately  for  each  subgroup,  i.e.,  to  use  subgroup  membership  as  a  moderator 
variable.  The  term  moderator  variable  was  introduced  by  Saunders  (1.956)  and 
the  concept  has  had  many  labels  and  many  definitions  (Banas,  1965).  The  defini¬ 
tion  of  moderator  variable  used  in  the  present  investigation,  as  suggested  by 
Banas  (1965),  is  any  variable,  quantitative  or  qualitative,  which  improves  the 
usefulness  of  a  predictor  b^  isolating  subgroups  of  LndividuaJs  for  whom  a 
predictor  or  set  of  regression  weights  are  especially  appropriate. 


Moderator  Variables  and  Validation 


The  moderator  variable  approach  has  been  advocated  by  many  investigators 
in  this  area.  Arvey  (1967)  has  stated  that  businesses  wishing  to  see  that 
Negroes  get  the  job's  for  which  they  are  qualified  should  undertake  sophisticated 
validation  procedures  for  their  existing  tests  and  establish  different  norm 
groups  and  validity  coefficients  for  Negroes  and  whites.  Wallace,  Kissinger 
and  Reynolds  (1966)  have  recommended  that  all  tests  be  validated  in  the  setting 
where  they  will  be  used  and  validation  should  be  for  as  many  separate  groups  as 
possible  in  preference  to  one  large  heterogeneous  group.  Mitchell,  Albright 
and  McMurray  ( 1968) ,  after  failing  to  find  either  total  sample  or  subgroup 
validity  for  the  Wonderlic  Personnel  Test  with  a  supervisory  rating  as  the 
criterion  measure,  emphasized  the  need  for  subgroup  validation  research  in  all 
job  situations. 

Guion  (1965,  1966)  has  also  advocated  the  investigation  of  race  as  a 
moderator  variable  and  has  suggested  that  different  expectancy  tables  be  de¬ 
veloped  for  Negroes  and  whites  in  the  job  applicant  population.  Kirkpatrick, 
et  al.  (1967),  in  their  conclusions  based  upon  a  series  of  studies  of  differen¬ 
tial  selection  among  applicants  from  different  socio-economic  or  ethnic  backgrounds, 
stated  that  tests  should  be  validated  separately  for  each  ethnic  group  and  that 
either  different  standards  of  selection  or  different  selection  instruments  should 
be  used  with  different  ethnic  groups  in  most  instances. 

The  Equal  Employment  Opportunity  Commission  (1966)  has  also  stressed  the 
importance  of  validating  a  selection  test  for  each  minority  group  in  the  popula¬ 
tion.  Anastasi  (19 66),  also  advocating  the  use  of  moderator  variables,  stated 
that  moderator  variables  ore  of  particular  interest  because  of  the  widespread 
concern  regarding  the  use  of  tests  with  various  subgroups  of  the  general  popula¬ 
tion,  especially  culturally  disadvantaged  subgroups.  She  believes  that  the 
empirical  investigation  of  moderator  variables  in  the  interpretation  of  test 
scores  is  a  more  constructive  approach  than  the  evuoive  procedures  of  so-called 
culture-free  tests. 

Bartlett  and  O'Leary  (1969)  have  developed  a  differential  prediction  model 
to  moderate  the  effects  of  heterogeneous  groups  in  personnel  selection  and 
classification.  Several  situations  have  been  described  in  which  subgroup  test 
bias  has  been  or  could  be  found.  These  situations  have  been  labeled  1)  equal 
validity  and  unequal  means;  2)  differential  validity;  3)  opposite  validity; 
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and  4)  no  validity  in  subgroups.  Each  of  these  general  categories  can  be  further 
divided  into  subcategories  describing  the  specific  relationship  between  the  pre¬ 
dictor  and  criterion  measures  for  each  subgroup.  A  survey  of  the  literature  in 
the  area  of  personnel  selection  from  a  heterogeneous  applicant  population  re  zeals 
the  need  for  the  use  of  such  a  differential  prediction  model  in  a  selection 
procedure . 

Literature  Review 

The  following  literature  review  has  been  organized  by  following  the  termi¬ 
nology  suggested  by  Bartlett  and  O’Leary  {19&9) • 

1.  Equal  validity  and  unequal  means.  In  thiG  situation  the  predictor  tost 
yields  equal  validity  foi  the  subgroups  but  differential  mean  performance  on  the 
tost  or  criterion  exists.  This  typically  results  in  a  lower  validity  if  the 
subgroups  are  combined.  Conversely,  separate  prediction  for  the  subgroups  would 
lead  to  increased  validity.  An  exception  to  this  would  be  where  the  predictor 
and  criterion  mean  differentials  are  in  the  same  direction j  i .e. ,  group  X  is 
superior  to  group  Y  on  both  the  predictor  and  criterion  measures.  In  this 
particular  situation  the  test  is  not  biased  since  it  reflects  a  real  difference 
in  xjredicted  performance.  (See  Figure  1  in  Appendix  A  for  an  illustration  of 
this  relationship.1) 

Cleary  (l 966)  has  reported  a  study  i.n  which  equaL  validity  but  unequal  means 
on  both  the  predictor  and  criterion  were  found.  Attempting  to  predict  first  year 
college  grade  point  average  at  a  state  supported  institution  in  the  Southwest, 
Cleary  found  that  the  non-white  group  had  lower  mean  scores  on  both  the  predictor 
(Scholastic  Aptitude  Test)  and  the  criterion  (grade  point  average)  but  that  the 
separate  validities  of  the  white  and  non-white  groups  were  approximately  equal. 
Combining  these  two  groups  for  purposes  of  prediction  would  probably  lead  to 
increased  validity  due  to  the  increased  heterogeneity. 

Although  the  Cleary  ( 19 66)  study  is  a  case  in  which  validity  of  prediction 
could  be  increased  by  combining  groups,  most  other  situations  would  result  in 
reduced  validity. 


Figures  1  through  11  in  Appendix  A  are  offered  as  illustrative  models. 
They  are  not  intended  to  literally  represent  the  bivariate  distributions  or 
correlations  cited. 
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Kirkpatrick,  Even,  Barrett  and  Katzell  (1967)*  studying  white  and  non¬ 
white  groups  (both  from  culturally  deprived  backgrounds)  who  were  participating 
in  a  heavy  vehicle  driver  training  program  for  the  unemployed  in  New  York  City, 
found  a  significant  difference  in  favor  of  the  white  group  on  the  mean  predictor 
scores,  yet  no  significant  difference  on  the  criterion  measures.  Predictors 
used  were  the  Gates  Reading  Survey  and  the  Numerical  Ability  Test  of  the  Dif¬ 
ferential  Aptitude  Test;  the  criteria  were  graduation  vs.  termination  in  the 
training  program  and  scores  on  verbal  proficiency  tests  in  the  training  program. 
If  these  two  groups  were  combined,  not  only  would  a  lower  validity  result  but 
the  non-white  group  would  not  be  as  likely  to  be  selected  if  a  cutting  score 
based  on  the  combined  group  were  used.  Since  both  groups  had  essentially  equal 
chances  of  success.,  test  discrimination  would  result  .if  the  groups  were  combined. 
However,  by  including  race  as  a  moderator1,  bettor  prediction  of  criterion  per¬ 
formance  would  be  possible  as  well  as  the  elimination  of  racial  discrimination. 

( See  Figure  P.  in  Appendix  A . ) 

Kirkpatrick,  et  al .  (1967)  also  report  such  a  relationship  between  predic¬ 
tor  and  criterion  measures  with  a  sample  of  493  white  and  98  Negro  female 
clerical  workers  in  several  insurance  companies.  In  this  concurrent  validation 
study,  the  Negro  group  performed  more  poorly  than  the  white  group  on  all  but 
one  part  of  u  clerical  selection  test  battery,  but  no  differences  existed  on 
either  criterion  measure,  salary  and  supervisory  ratings.  The  validities 
obtained  were  essentially  the  same  for  both  groups.  Although  methodological 
problems  prevented  ary  conclusive  statements  about  test  bias  in  this  situation, 
the  data  suggested  that  bias  in  the  predictor  test  battery  might  exist. 

Other  s  .tuations  are  possible  where  equal  validities  but  unequal  means 
would  lead  to  both  poorer  selection  decisions  and  test  bias  if  the  subgroups 
were  combined.  First  is  the  case  where  there  is  a  difference  between  groups 
on  criterion  performance,  yet  no  difference  in  test  performance.  (Gee  Figure 
3  in  Appendix  A.)  This  would  result  in  overestimation  of  the  change  of  success 
for  one  group  and  underestimation  for  the  other.  The  existence  of  differences 
in  mean  performance  on  both  the  criterion  and  predictor,  but  in  opposite  direc¬ 
tions.  is  another  possible  situation  (See  Figure  4  in  Appendix  A).  If  the 
two  groups  were  combined,  although  positive  validities  existed  for  each  group 
separately,  an  overall  negative  correlation  would  result.  If  personnel  deci¬ 
sions  were  made  on  the  basis  of  a  regression  equation  for  the  combined  groups, 
the  worst  from  each  group  would  be  selected! 


8 


2.  Differential  validity.  A  selection  test  may  be  valid  for  one  sub¬ 
group  in  an  applicant  population  and  not  valid  for  another,  or  the  validities 
may  be  of  different  magnitude  or  even  different  direction  of  relationship. 

In  a  study  of  female  toll  collectors,  Lopez  (1966)  found  differential 
validity  for  the  subgroups  but  no  differences  in  mean  performance  on  either 
the  criterion  (absences)  or  the  predictor  (Clerical  Aptitude  Test  of  the 
Differential  Aptitude  Tests).  (See  Figure  5  in  Appendix  A  for  an  illustration 
of  this  relationship.)  Lopez  found  no  validity  (r  =  +.01)  for  the  white  group, 
a  significant  correlation  (  r  =  -.18,  p  .4.01)  for  the  Negro  group,  and  no 
validity  for  the  combined  group  (r  =  -.03).  With  the  same  sample  Lopez  (1966) 
also  found  both  differential  validity  and  differential  mean  predictor  perfor¬ 
mance  (with  an  inteiview  check  list  as  the  predictor)  but  no  significant 
differences  in  mean  criterion  performance  (see  Figure  7  in  Appendix  A).  Again 
Lopez  reported  no  validity  for  the  white  sample  (  r  =  +.02),  low  but  significant 
validity  for  the  Negro  group  (r  =  -.14,  p<.0l).  and  no  validity  for  the  combined 
group  (r  =  -.07).  It  should  be  noted  that  the  correlations  reported  have  been 
corrected  for  restriction  of  range.  Whether  the  uncorrected  correlations  were 
significant  was  not  reported. 

Cleary  (1966),  investigating  academic  prediction,  reported  significant 
mean  differences  favoring  the  white  group  on  both  the  predictor  (Scholastic 
Aptitude  Test  -  Mathematics)  and  the  criterion  (first  year  grade  point  overage) 
but  she  also  found  differential  validity.  Cleary  reported  a  significant  vali¬ 
dity  coefficient  (  r  =  .25,  p^.05)  for  the  white  group  but  no  significant 
correlation  (r  =  .01,  n.s.)  for  the  non-white  group.  Thus,  this  predictor  would 
be  appropriate  for  the  white  group  but  not  for  the  non-white  or  the  combined 
group.  Although  a  valid  prediction  could  be  made  from  this  test  for  the  com¬ 
bined  group,  this  was  possible  only  because  the  test  Identified  the  lower 
performing  group  of  non-whites  (see  Figure  8  in  Appendix  A). 

Kirkpatrick,  Ewen,  Barrett  and  Katzell  (1967)  studied  several  job  situa¬ 
tions  involving  many  different  selection  tests  and  criteria  in  an  attempt  to 
provide  evidence  in  an  industrial  setting  concerning  possible  test  bias  in 
selection  procedures.  They  found  differential  validity  in  a  number  of  different 
job  situations. 


With  a  sample  of  102  white  and  Negro  female  clerical  workers,  Kirk¬ 
patrick,  et  al.,  (l$67)  reported  a  validity  coefficient  of  .21  (p<.05)  for 
the  combined  group,  using  as  a  predictor  the  Numerical  Test  of  the  Short 
Employment  Test  and  a  merit  rating  criterion.  For  the  white  group  the  vali¬ 
dity  coefficient  was  .25  (p<.05),  but  for  the  Negro  group  it  was  .02.  In 
another  study  reported  by  Kirkpatrick,  et  al.  of  137  males  in  a  General 
Maintenance  Training  program  (31  white,  53  Negro  and  53  Spanish))  differential 
validity  was  also  found.  Using  the  Gat<  Heading  Survey  as  the  predictor  and 
proficiency  task  scores  as  the  criteri:  ,;hey  obtained  a  significant  validity 

coefficient  (r  =  .29,  p  ^.01)  for  the  *  .olned  group,  a  significant  coefficient 
(r  =  ,h2,  p^..0l)  for  the  Negro  group ?yet  no  validity  for  either  the  white 
group  (r  =  .02)  or  the  Spanish  group  (r  =  .07).  The  correlations  reported 
between  the  same  predictor  (Gates  Reading  Survey)  and  a  termination  criterion 
were  .19  (n.s.)  for  the  combined  group,  .08  (n.s.)  for  the  white  group,  .31 
(p<.05)  for  the  Negro  group,  and  .30  (p<.05)  for  the  Spanish  group.  The 
mean  performance  on  the  Gates  was  significantly  (p  < .01)  lower  for  both  the 
Negro  and  Spanish  groups  than  the  white  group.  There  were  no  significant 
differences  on  the  termination  criterion  but  the  Spanish  group  performed  sig¬ 
nificantly  lower  than  the  white  group  on  the  proficiency  tasks  (p<.0l). 

Kirkpatrick,  et  al.  (1967)  also  reported  a  study  using  nursing  students 
as  the  sample  and  validating  a  test  battery  ( Pre-Nursing  and  Guidance  Examina¬ 
tion  developed  by  the  National  League  for  Nursing)  against  a  criterion  consisting 
of  a  set  of  state  licensing  examinations.  There  were  five  examinations:  medical 
nursing,  surgical  nursing,  obstetrical  nursing,  pediatric  nursing,  and  psychiatric 
nursing.  The  criterion  examination  appeared  to  be  unbiased  as  no  consistent 
pattern  of  mean  performance  scores  emerged}  l.e.,  whites  were  superior  on  two  of 
the  exams,  Negroes  were  superior  on  one  and  there  were  no  differences  on  two  of 
the  examinations.  Inspection  of  the  correlation  matrix  of  the  nine  subscores 
on  the  FNG  test  battery  and  the  five  state  examinations  revealed  3^  instances 
where  validity  existed  for  the  combined  and  white  groups  but  not  for  the  Negro 
group}  five  instances  where  validity  existed  only  for  the  white  group  but  not 
for  the  combined  or  Negro  groups;  and  six  instances  in  which  validity  for  all 
groups  existed.  This  large  percentage  of  cases  in  which  differential  validity 
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was  found  indicates  that  this  situation  is  perhaps  all  too  common  in  selection 
situations. 

Ruda  and  Albright  ( 1968)  found  that  the  correlations  between  the  Wonder lie 
and  a  turnover  criterion  were  -.2 6  (r  p<.Ol)  for  the  combined  group,  -.3^ 

(r  p<.Ol)  for  the  white  group,  and  +  .10  (r  ^  ,  n.s.)  for  the  Negro. 

group.  The  sample  consisted  of  1^7  white  and  91  Negro  clerical  workers.  Since 
there  are  questions  about  the  appropriateness  of  testing  a  biserial  correlation 
for  significance,  the  present  authors  calculated  the  point  biserial  correlations 
for  each  of  the  above  relationships  and  tested  them  for  significance.  The  total 
group  and  white  sample  correlations  were  again  found  significant  and  the  Negro 
sample  correlation  was  not  significant. 

In  all  of  the  above-mentioned  studies,  it  is  apparent  that  the  predictors 
used  were  not  appropriate  for  all  of  the  subgroups  within  the  population.  This 
points  to  the  need  for  the  development  and  use  of  valid  predictors  for  each  of 
the  subgroups  within  a  heterogeneous  job  applicant  population. 

3.  Opposite  validity.  Lopez  ( 1966)  has.  reported  a  case  where  a  te3t  had 
significant  positive  validity  for  one  group  and  significanc  negative  validity 
for  another.  There  were  no  significant  differences  in  mean  test  performance 
(see  Figure  9  in  Appendix  A).  With  a  sample  of  toll  collectors,  Lopez  reported 
a  validity  coefficient  of  . 19  (p  -Ol)  for  the  white  group  between  the  Cleri¬ 
cal  Aptitude  Test  of  the  Differential  Aptitude  Tests  and  a  criterion  of  tolls 
accuracy,  yet  a  corresponding  correlation  of  -.23  (p<.0l)  for  the  non-white 
group.  Thus, the  use  of  this  test  for  selection  purposes  with  a  combined  group 
would  have  no  validity.  Only  through  the  use  of  subgroup  analyses  could  the 
proper  interpretation  of  test  performance  be  made;  i.e.,  one  should  hire  whites 
who  have  a  high  score  but  non-whites  who  have  a  low  score  on  the  test.  Lopez 
(1966)  also  reported  a  similar  situation  where  a  mental  ability  test  correlated 
in  opposite  directions  for  two  racial  groups  but  in  this  case  the  white  group 
was  superior  in  test  performance  (see  Figure  10  in  Appendix  A).  There  was  no 
significant  difference  in  criterion  (tolls  accuracy)  scores.  The  correlation 
for  the  white  group  was  .1 6  (p<;.0l),  but  -.18  (p-i.0l)  for  the  non-white  group. 


Either  differential  or  non-linear  prediction  would  result  in  validity  in  this 
situation,  but  with  the  combined  group  no  linear  prediction  would  be  possible. 
Again  it  should  be  noted  that  the  correlations  reported  by  Lopez  were  corrected 
for  restriction  of  range  and  the  signficance  of  the  uncorrected  correlations 
is  not  known. 

b.  No  validity  in  subgroups.  It  is  possible  that  a  test  which  is  valid 
for  a  combined  group  is  not  valid  for  any  of  the  subgroups  within  the  population 
This  could  occur  if  significant  differences  exist  in  the  same  direction  on  both 
the  predictor  and  criterion  measures  (see  Figure  11  in  Appendix  A).  This  effec¬ 
tively  means  that  the  selection  procedure  is  based  upon  the  use  of  a  variable, 
for  example  race  or  socio-economic  class,  that  is  not  related  to  job  performance 
Since  the  test  is  valid  for  neither  Group  X  or  Group  Y,  it  should  not  be  used  in 
any  way  to  influence  personnel  decisions.  The  validity  of  the  combined  group 
would  be  based  only  upon  the  fact  that  the  two  groups  differed  in  performance. 
The  test  in  this  case  is  actually  only  a  crude  measure  of  the  dimension  on  which 
the  groups  differ;  for  example,  race.  Failure  to  consider  through  appropriate 
analyses  the  validity  in  the  subgroups  would  result  in  inadvertent  racial  dis¬ 
crimination  through  the  personnel  testing  program. 

Kirkpatrick,  Even,  Barrett  and  Katzell  (1967)  reported  several  cases  in 
which  no  validity  in  subgroup  s  was  found.  However,  none  of  the  data  exactly 
fits  the  above  model.  In  particular,  in  none  of  the  cases  reported  do  the 
groups  differ  on  both  the  criterion  and  predictor  variables.  All  of  the  re¬ 
ported  cases  involved  a  sample  of  39  white  and  33  Negro  clerical  workers. 

Using  a  vocabulary  test  as  the  predictor,  a  correlation  of  .2‘j  (p^.05)  was 
found  for  the  combined  group  with  a  rating  of  quality  of  work  as  the  criterion. 
However,  the  equivalent  correlation  for  the  white  group  was  .2‘j  (n.s.)  and  .19 
(n.s.)  for  the  Negro  group.  With  the  same  predictor,  correlations  with  a  rating 
of  overall  performance  were  .27  (p<.Op)  for  the  combined  group,  ,2b  (n.s.)  for 
the  white  group  and  .30  (n.s.)  for  the  Negro  group.  Tne  vocabulary  test  cor¬ 
related  with  a  rating  of  overall  effectiveness  .30  (p<.Op)  for  the  combined 
group,  .28  (n.s.)  for  the  white  group  and  .26  (n.s.)  for  the  Negro  group.  A 
significant  difference  (p<.Ol)  in  the  mean  rating  of  overall  effectiveness  was 


the  only  significant  difference  found  in  any  of  the  predictor  or  criterion 
measures.  The  significance  of  the  combined  group  correlation  appears  to  be 
only  a  function  of  the  sample  size.  With  larger  samples,  it  would  be  likely 
that  validity  in  the  subgroups  as  well  as  in  the  combined  group  would  be  found. 

This  survey  of  the  literature  concerned  with  the  problem  of  prediction  of 
job  success  for  heterogeneous  job  applicant  populations  indicates  that  dis¬ 
crimination  in  personnel  selection  tests  has  been  found  in  a  variety  of 
occupational  situations.  One  can  only  conclude  that  the  proper  consideration  of 
this  problem  is  a  necessity  for  an  adequate  test  validation  procedure. 

It  should  not  be  implied  from  the  preceding  literature  survey  that  all 
personnel  tests  are  biased  against  or  for  minority  group  members.  Studies 
have  been  reported  in  which  no  test  discrimination  was  found  (Tenopryr,  1967; 
see  also  Kirkpatrick,  et  al.,  1967).  A  report  of  the  APA  Task  Force  on  Employ¬ 
ment  Testing  of  Minority  Groups  (1969)  states  that  no  clear  trends  have  been 
established  concerning  the  existence  of  bias  in  predicting  job  performance  and 
that  no  firm  conclusions  are  possible.  Thus  the  present  investigation  is  an 
attempt  to  provide  more  evidence  as  to  the  degree  of  pervasiveness  of  test 
bias  in  personnel  selection  procedures. 


13 


Section  II:  General  Method 


Seven  independent  studies  are  reported  which  employed  similar  methodology. 
This  section  provides  an  overview  of  the  research  effort  to  limit  the  amount  of 
redundancy  that  would  occur  if  all  phases  of  each  study  were  separately  de¬ 
scribed  in  detail. 

Subjects 

As  the  purpose  of  this  phase  of  the  research  project  was  to  investigate 
existing  predictor-criterion  relationships  in  Job  situations,  the  subjects  in 
all  studies  were  current  on-the-job  employees  or  members  of  existing  situa¬ 
tional  groups  in  the  case  of  correctional  institution  inmates.  Thus ,  the 
samples  a  LI  consisted  of  prc-selectcd  groups  of  individuals.  The  sample 
consisted  of  those  persons  who  had  been  members  of  the  group  under  study 
for  at  least  three  months.  To  assure  as  large  a  sample  size  as  possible, 
a  maximum  tenure  length  was  not  used  as  a  restrictive  criterion  for  inclusion 
in  the  sample,  i.e.,  no  attempt  was  made  to  develop  a  relatively  homongeneous 
sample  with  respect  to  tenure  by  setting  a  maximum  length- of- service  cutting 
point.  Tne  effects  of  tenure  upon  the  predictor-criterion  relationships 
were  statistically  controlled  when  deemed  necessary. 

Predictors 

All  predictors  were  psychological  tests  which  were  a  part  of  the  existing 
selection  procedure.  Most  of  the  tests  were  used  as  explicit  selection  devices 
though  some  had  been  included  only  for  experimental  purposes.  All  of  the  actual 
test  administration  was  conducted  by  the  personnel  of  she  organization  furnish¬ 
ing  the  data.  In  most  instances,  the  subjects  in  a  given  sample  were  not  tested 
at  the  same  time  and  by  the  same  administrators  due  to  tenure  differences. 

Criteria 

A  number  of  criterion  measures  were  used  in  each  study.  Most  criteria 
were  already  existing  measures  of  job  performance  hut  in  some  cases  the  measures 
were  developed  by  the  investigators.  In  all  studies  an  attempt  was  made  to 
have  criteria  which  measured  a  wide  sample  of  job  performance  behaviors.  This 
was  limited  in  certain  situations  by  the  record  systems  of  the  organizations 
and  other  practical  considerations. 


Statistical  Analyses 

Means  and  standard  deviations  of  all  predictors  and  criterion  variables 
were  computed  for  the  total  sample,  the  white  subgroup  and  the  Negro  subgroup. 
The  significance  of  the  difference  between  the  mean  predictor  performance  of 
the  two  subgroups  was  tested  by  means  of  the  t  test.  Similar  tests  were 
computed  for  the  mean  criterion  performance  of  the  two  subgroups.  It  should 
be  noted  that  the  distributions  of  some  variables  are  rather  skewed.  A 
basic  assumption  of  the  t  test  is  normality  of  the  underlying  distribution 
of  the  populations.  However,  Boneau  (i960)  has  shown  that  the  t  test  is 
relatively  insensitive  to  violations  of  its  assumptions.  Hays  (1963)  states 
that  the  assumption  of  normality  may  be  violated  "almost  with  impunity  pro¬ 
vided  that  sample  size  is  not  extremely  small,"  (p.322).  A  more  serious 
problem  is  the  interaction  of  the  effects  of  unequal  sample  sizes  ar.d 
heterogeneity  of  the  two  sample  variances.  If  an  F  test  of  the  ratio  of  the 
sample  variances  revealed  heterogeneity,  the  correction  suggested  by  Welch 
(1947)  was  applied. 

The  validity  of  each  predictor  for  each  criterion  measure  was  estimated 
by  computing  zero-order  correlations  for  all  possible  predictor-criterion 
pairs  for  each  sample.  Validity  coefficients  were  computed  for  the  total 
sample,  the  white  subgroip  and  the  Negro  subgroup.  In  those  samples  in  which 
more  than  one  predictor  had  been  used,  multiple  correlations  were  not  com¬ 
puted  because  of  the  instability  of  such  statistics  with  samples  of  the 
relatively  small  size  (in  relation  to  the  number  of  predictors)  that  existed 
in  the  present  investigation.  Furthermore,  the  subgroup  sizes,  especially 
of  the  Negro  subgroup,  were  not  large  enough  to  permit  the  use  of  cross- 
validation  procedures. 

Comparisons  of  each  predictor-criterion  relationship  for  the  white  and 
Negro  subgroups  were  made  by  three  methods  of  analysis.  First,  the  signifi¬ 
cance  of  the  validity  coefficients  for  both  subgroups  was  examined.  Tests  of 
the  significance  of  the  difference  between  the  two  subgroup  validity  coef¬ 
ficients  were  computed.  Also,  the  regression  tests  of  the  analysis  of 
covariance  (Potthoff,  1966)  were  computed  to  test  the  equality  of  the  regress i 
elopes  and  intercepts  for  the  two  subgroups  for  each  prod ictor-cri ter ion  poir. 
This  procedure  results  in  three  separate  F  ratios,  i*^  simultaneously  tests 


the  hypothesis  that  both  the  regression  slopes  and  the  intercepts  are  equal 
for  the  two  groups.  If  F^  is  significant  one  may  conclude  that  bias  exists. 
Fg  tests  the  hypothesis  that  the  regression  slopes  are  equal  for  the  two 
groups.  F^  tests  the  hypothesis  that  a  common  intercept  is  appropriate  for 
the  two  groups.  F^  is  an  appropriate  test  only  when  is  not  significant. 

These  three  methods  of  analysis  actually  constitute  two  different 
approaches  to  the  comparison  of  the  validity  of  a  test  in  two  different 
ethnic  subgroups  (Kirkpatrick,  Ewen,  Barrett,  and  Katzell,  1968) .  The 
first  approach  involves  testing  the  null  hypothesis  that  the  validity  coef¬ 
ficient  for  a  given  test  and  criterion  is  equal  to  .00  (for  one  or  both  of 
the  subgroups) .  Three  possible  results  exi st  with  this  approach .  The  test 
may  be  found  to  be  valid  for  neither,  both,  or  one  of  the  subgroups.  If  the 
test  is  found  to  be  valid  for  neither  subgroup,  nothing  can  really  be  said 
about  differences  in  validity  since  the  test  is  inappropriate  in  this  situa¬ 
tion.  If  the  test  is  found  to  be  valid  for  both  of  the  subgroups,  then  it 
can  be  appropriately  used  with  both  subgroups  to  predict  job  performances. 

If  the  test  is  found  to  be  valid  for  one  subgroup  but  not  for  the  other, 
there  exists  a  difference  in  utility  in  that  one  may  have  more  confidence 
that  the  test  is  validly  useful  in  one  ethnic  subgroup  than  in  the  other. 

The  alternate  approach  to  the  comparison  of  the  validity  of  a  test  in 
two  different  ethnic  subgroups  is  to  test  the  significance  of  the  difference 
between  the  validity  coefficients  of  the  two  subgroups.  This  approach  tests 
the  hypothesis  that  the  two  subgroups  are  drawn  from  the  same  population  with 
respect  to  the  degree  of  validity.  Rejection  of  the  null  hypoth<:3i s  would 
denote  differential  validity,  while  failure  to  reject  would  denote  uniform 
validity  for  the  two  subgroups.  It  is  possible  that  the  second  approach  may 
fail  to  show  a  difference  at  a  given  level  of  confidence  while  the  first 
does.  This  can  occur  because  of  differences  between  the  two  approaches  with 
respect  to  both  degrees  of  freedom  and  the  sampling  error  associated  with  tin- 
test  of  significance.  Kirkpatrick,  et  al.  ( 1968)  have  indicated  that  the 
useful  conclusion  in  this  situation  is  one  of  a  difference  in  significant 
validity,  in  that  one  might  use  the  test  with  some  confidence  to  select 
members  of  one  ethnic  subgroup  but  not  of  the  other. 


It  is  also  possible  for  the  first  approach  to  show  no  validity  in  either 
subgroup  but  the  second  to  show  a  significant  difference  between  the  validity 
coefficients.  This  can  occur  if  one  of  the  coefficients  is  positive  and  the 
other  is  negative.  Again,  the  practical  interpretation  is  to  use  the  test 
with  neither  subgroup.  In  this  series  of  studies,  both  methods  of  comparing 
validity  in  different  ethnic  subgroups  have  been  employed  and  reported,  but 
primary  attention  has  been  paid  to  the  outcomes  of  the  first  because  of  its 
practical  implications.  The  analysis  of  covariance  for  homogeneity  of  regres¬ 
sion  essentially  may  be  categorized  as  utilizing  the  second  approach  but  was 
also  employed  as  a  further  means  of  analysis  because  of  its  ability  to  detect 
regression  intercept  differences. 

Model  Identification 

Predictor-criterion  relationships  were  analyzed  using  the  Bartlett  and 
O'Leary  differential  prediction  model  in  an  attempt  to  determine  the  relative 
frequency  of  the  different  models. 

In  accord  with  the  above  mentioned  methods  of  analysis,  two  separate  methods 
of  model  identification  were  utilized  in  tho3e  situations  where  differential 
validity  was  demonstrated  for  the  two  racial  groups  (Models  5-10).  All 
predictor-criterion  relationships  in  which  a  validity  coefficient  was  signifi¬ 
cant  for  one  racial  group,  but  not  significant  for  the  other  were  identified 
as  illustrations  of  models  when  the  first  method  or  model  identification  was 
used.  Because  of  the  rather  large  difference  in  sample  size  between  the  two 
racial  groups,  this  procedure  identified  as  models  those  relationships  in  which 
the  absolute  magnitude  of  the  nonsignificant  correlation  for  Negro  sample  was 
larger  than  the  corresponding  signf leant  correlation  for  the  white  sample. 

These  cases  have  been  identified  as  illustrations  of  models  since  it  is  diffi¬ 
cult  to  justify  the  use  of  the  test  for  the  Negro  sample.  However,  there  is 
some  justification  in  using  the  test  for  the  ./hite  sample  even  though  the 
absolute  magnitude  of  the  validity  coefficient  is  smaller  than  for  the  Negro 
sample . 

The  second  method  used  to  identify  illustrations  of  models  imposed  the 
additional  criteria  of  a  statistically  significant  difference  between  the 
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validity  coefficients  for  the  two  racial  groups.  This  method  tends  to 
identify  clear  illustrations  of  the  various  models.  In  each  study  reported 
a  distinction  is  made  between  the  models  which  meet  only  the  first  criterion 
and  those  models  which  meet  both  criteria. 
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Section  III:  Studies  of  Existing  Selection 
Procedures 
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Study  1:  Toll  Collectors 


Sample 

The  subjects  were  159  female  toll  collectors  (115  white  and  44  Negro) 
employed  at  the  five  toll  facilities  of  a  state  highway  department.  All 
employees  held  state  civil  service  classified  positions.  The  major  duties 
of  these  toll  collectors  are  to  determine  the  appropriate  toll  category 
for  each  vehicle;  to  collect  cash  or  toll  tickets  in  the  appropriate  amount 
from  each  vehicle;  and  to  make  change  when  necessary.  Table  1  presents 
biographical  information  on  these  employees. 


Table  1: 

Biographical  Data  -  Toll  Collectors 

Group 

X 

8 

»(D 

t 

Age 

Total 

33.86 

10.32 

152 

White 

34.48 

11.18 

108 

Negro 

32.32 

T.YO 

44 

1.35 

Education 

Total 

11.68 

1.02 

152 

(in  years) 

White 

11.56 

1.03 

108 

Negro 

11.98 

0.95 

44 

2.32* 

Tenure 

Total 

34.90 

38.26 

150 

(in  months) 

White 

36.69 

41.16 

112 

Negro 

30.34 

29.54 

44 

1.0(> 

(1)  Total  N  is 

less  than  ,15' ' 

because  of 

incomplete  dut-a 

for  some 

subjects. 

(2)  t  ratios  are  between  the  means  of  the  white  and  Negro  groups. 
*  P  <  .05 
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It  can  be  seen  from  the  data  in  Table  1  that  the  white  and  Negro  groups 
differed  significantly  only  in  education,  the  Negro  group  having  attained  a 
higher  educational  level. 

Predictor  Comparisons 

Two  tests,  both  developed  by  the  state  personnel  department,  have  been 
used  as  selection  devices  for  the  position  of  toll  collector.  Specifically, 
these  tests  were  a  Clerical  Checking  Test  and  an  Arithmetic  Reasoning  Test. 
Because  of  the  recent  application  of  these  tests,  the  number  of  subjects  for 
whom  data  was  available  was  considerably  diminished.  Table  2  presents  the 
predictor  means,  standard  deviations  and  tests  of  significance  of  mean 
differences  for  the  white  and  Negro  samples. 


Table  2:  Predictors  -  Means,  Standard  Deviations,  N' s  and  Tests 
of  Significance  of  Mean  Differences  -  Toll  Collectors 


Predictor 

Group 

X 

s 

N 

tU) 

Clerical 

Total 

75-36 

4.29 

128 

Checking 

White 

75-71 

4.45 

89 

Negro 

74.56 

3.84 

39 

1.39 

Arithmetic 

Total 

94.03 

5.08 

143 

Reasoning 

White 

94.88 

4.54 

101 

Negro 

91.99 

5.74 

42 

3.18** 

(1)  t 

ratios  are 

between  mean 

test  performance  for  the 

white  and 

Negro  groups. 


**  p  <.01 
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The  white  and  Negro  groups  did  not  differ  significantly  in  performance 
on  the  Clerical  Checking  Test.  However,  the  white  group  scored  significantly 
higher  on  the  Arithmetic  Reasoning  Test  than  the  Negro  sample.  The  inter¬ 
correlations  of  the  two  tests  were  .04  for  the  total  sample,  .09  for  the 
whites  and  -.17  for  the  Negro  sample. 

Criterion  Comparisons 

Several  measures  of  job  performance  were  utilized  in  this  study.  Atten¬ 
dance  records  for  three  months  were  obtained  from  the  records  of  the  state 
highway  department.  This  attendance  data  was  treated  in  two  ways.  First, 
the  raw  number  of  days  absent  from  the  job  was  used  in  the  analyses.  Also 
the  number  of  periods  of  absence  was  used,  e.g.,  three  consecutive  days 
absent  counted  as  one  period  of  absence,  but  three  nonconsecutive  days  absent 
counted  as  three  periods  of  absence. 

Extension  of  the  required  probationary  period  and  job  termination  were 
also  used  as  criteria.  Every  state  civil  service  employee  has  a  mandatory 
six  month  probationary  period  during  which  he  may  be  dismissed  for  almost 
any  reason  his  supervisor  deems  sufficient.  This  probationary  period  may 
be  extended  for  one  more  six  month  period  if  the  supervisor  desires  more 
time  to  decide  if  the  employee  should  be  permanently  hired.  Only  one  such 
extension  is  allowed.  This  criterion  was  dichotomously  scored,  a  "o"  repre¬ 
senting  extension  of  the  probationary  period  and  a  "1"  representing  no  exten- 
sio-  of  the  probationary  period.  The  termination  criterion  was  also  dichot¬ 
omously  scored,  a  "0"  representing  termination  and  a  "l"  representing  an 
employee  still  employed. 

Two  objective  criterion  measures  were  obtained  for  this  sample,  dollar 
accuracy  and  axle  accuracy.  Dollar  accuracy  for  a  given  toll  collector  was 
measured  in  terms  of  the  ratio  of  the  total  number  of  transactions  in  a 


Table  3* 

Criteria 

-  Means, 

Standard  Deviations, 

N*  a,  and  Tests 

of  Significance  of  Mean  Differences 

-  Toll 

Collectors 

Criterion 

Group 

X 

8 

JL 

t<U 

Attendance  - 

Total 

3-27 

5.43 

153 

Days 

White 

3.31 

5.58 

111 

Absent 

Negro 

3.16 

5.08 

42 

0.14 

Attendance  - 

Total 

1.61 

I.96 

153 

Periods  Absent 

White 

1.51 

1.88 

111 

in  3  months 

Negro 

I.85 

2.17 

42 

0.97 

Termination 

Total 

0.85 

0.35 

157 

White 

0.87 

O.34 

114 

Negro 

0.81 

0.39 

43 

0.94 

Extension  of 

Total 

0.82 

0.39 

147 

Probation 

White 

0.81 

O.39 

106 

Negro 

0.83 

0.38 

4l 

0.28 

Dollar 

Total 

150.40 

22.71 

120 

Accuracy 

White 

151.85 

22.03 

V4 

Negro 

146.51 

24.35 

35 

1.18 

Axle 

Total 

150.23 

23.07 

129 

Accuracy 

White 

150.73 

22.58 

94 

Negro 

148.90 

24.64 

35 

0.40 

(l)  t  ratios  are  between  means  of  white  and  Negro  samples. 


month  that  the  toll  collector  completed  to  the  amount  of  error  (in  dollars) 
in  the  toll  receipts  turned  m  during  that  month.  Axle  accuracy  was  meas¬ 
ured  by  the  ratio  of  the  total  number  of  transactions  in  a  month  to  the 
number  of  errors  in  exle  count  in  that  month.  The  toll  collector  must 
count  the  number  of  axles  to  determine  the  proper  toll  category  for  trucks; 
the  number  of  axles  is  also  automatically  recorded  by  a  treadle-type  counter 
for  each  toll  booth.  Because  toll  collectors  from  several  facilities  were 
included  in  the  sample,  the  accuracy  measures  were  converted  to  T-scores 
with  a  mean  of  50  and  a  standard  deviation  of  10  before  being  grouped  for 
the  analyses.  The  T-score  for  each  collector  was  based  on  the  distribution 
of  the  accuracy  measures  for  her  facility  only.  This  data  transforms!. ion 
was  made  to  help  control  for  extraneous  situational  variance  in  these  meas¬ 
ures.  The  accuracy  data  for  three  months  wtrre  used;  the  T-scores  for  a 
subject  for  the  three  months  were  summed  to  provide  a  single  measure  of 
each  accuracy  criterion. 

The  criteria  means,  standard  deviations  and  tests  of  significance  of 
mean  differences  for  the  white  and  Negro  samples  ure  presented  in  Table  3. 
Tnere  were  no  significant  differences  between  the  Negro  and  white  rumples 
on  any  criterion  measure. 

Validity 

The  correlations  between  the  predictors  and  criteria  for  the  total 
toll  collector  sample,  the  white  subgroup  and  tne  Negro  subgroup  are  snows 
in  Table  it.  If  a  predictor-criterion  relationship  fits  one  of  the  models 
proposed  by  Bartlett  and  O'Leary  (l9u9),  a  number  indicating  the  appropriate 
reference  figure  in  Appendix  A  is  enclosed  in  parentheses  beneath  the  Negro 
subgroup  correlation.  The  most  striking  fact  evident  from  Table  U  is  the 
general  lack  of  validity  of  either  test. 


Table  4:  Predictor  -  Criterion  Correlations 
Toll  Collectors  ^1>  2) 

Criterion  Group  Predictor 

Clerical  Checking  Test  Arithmetic  Reasoning  Test 
r  N  r  N 


Attendance  - 

Total 

-04 

122 

-02 

137 

Days  Abs. 

White 

-03 

Bb 

09 

97 

Negro 

-03 

37 

-21 

40 

Attendance  - 

Total 

-Ob 

122 

-11 

137 

Periods  Abs. 

White 

00 

8b 

oya 

97 

Negro 

-10 

37 

-33* 

(7) 

4o 

Termination 

Total 

06 

127 

-10 

142 

White 

05 

ay 

-IV 

101 

Negro 

04 

38 

-03 

4l 

Extension  of 

Total 

-04 

122 

02 

137 

Probation 

White 

-06 

86 

09 

98 

Negro 

01 

3b 

-11 

30 

Dollar 

Total 

-lb 

101 

-ob 

IK. 

Accuracy 

White 

-2b* 

71 

-03 

83 

Negro 

04 

(b) 

30 

-17 

33 

Axle 

Total 

-Ob 

101 

-07 

lib 

Accuracy 

White 

-10 

71 

O 

1 

83 

Negro 

06 

30 

-lb 

33 

(1)  Decimals  are  omitted. 

(2)  Number  in  parentheses  below  the  correlation  for  the  Negro  sample 
indicates  the  model  illustrated  (See  Appendix  A). 

*  P  <  .05 

a  Different  from  the  Negro  group  correlation  at  the  .Ob  level. 


ModelB  Illustrated 


The  relationship  between  the  Arithmetic  Reasoning  Test  and  the  attend¬ 
ance  criterion  measured  in  periods  of  absence  illustrates  Model  7  (Figure  7 
in  Appendix  A)  of  the  Bartlett  and  O’Leary  (1969)  schema.  Although  no  sig¬ 
nificant  differences  on  the  criterion  measure  were  found,  the  white  sample 
scored  significantly  higher  than  the  Negro  sample  on  the  test.  The  test 
was  valid  only  for  the  Negro  sample  (r  =  -.33*  P  <  .05);  not  for  the  white 
sample  (r  -  .09)  or  the  total  group  (r  -  -.11).  Thus,  this  test  is  not 
appropriate  for  the  prediction  of  this  attentance  criterion  for  the  total 
group  or  the  white  sample  but  it  would  be  useful  with  the  Negro  sample. 

Model  5  (Figure  5  in  Appendix  A)  is  illustrated  by  the  relationship  of 
the  Clerical  Checking  Test  and  the  criterion  of  dollar  accuracy.  No  signi¬ 
ficant  differences  on  either  the  predictor  or  the  criterion  were  found.  How¬ 
ever,  validity  was  found  only  for  the  white  sample  (r  -.25,  p  <  -05).  Hence, 
this  test  Is  not  appropriate  for  the  prediction  of  this  accuracy  criterion 
for  either  the  total  group  or  the  Negro  sample.  The  test  could  appropriately 
be  used  to  predict  performance  on  this  measure  for  the  white  sample. 

If  the  more  stringent  criterion  of  a  significant  difference  between  the 
subgroup  correlations  is  imposed,  only  the  relationship  between  the  Arith¬ 
metic  Reasoning  Test  and  the  attendance  criterion  (periods  of  absence)  is 
illustrative  of  a  model  (Model  7,  in  particular).  This  result  was  also  found 
by  the  analyses  of  covariance  for  homogeneity  of  regression  (Potthoff,  llX.>t>). 
Table  5  presents  the  results  of  this  method  of  analysis.  The  significant  Fj, 
statistic  for  the  Arithmetic  Reasoning  -  Attendance  (Periods  of  Absence) 
relationship  indicated  that  a  common  regression  line  cannot  be  used  to  pre¬ 
dict  both  white  and  Negro  subgroup  performance.  No  significant  F-ratios 
were  found  for  any  other  predictor-criterion  pair. 
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It  should  be  stressed  that  the  identification  of  models  is  for  illus¬ 


trative  purposes  only  and  extreme  caution  should  be  exercised  in  the  interpre 
tat ion  of  the  relationships  reported.  The  number  of  significant  correlations 
(2  of  a  possible  36)  was  only  slightly  greater  than  expected  by  chance  at 
the  .05  level. 
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Study  2:  Correctional  Officers 


Sample 

The  subjects  consisted  of  371  correctional  officers  (322  white  and 
1*9  Negro)  at  two  state  prisons.  The  major  duties  of  the  officers  are  to 
maintain  the  security  of  the  institution  and  to  supervise  the  work  activ¬ 
ities  of  the  inmates.  Biographical  information  for  the  officers  is  pre¬ 
sented  in  Table  6, 

Table  6:  Biographical  Data  -  Correctional  Officers 


Group 

X 

s 

N(l) 

t(2) 

Age 

Total 

37-38 

10.29 

358 

White 

38.41 

10-35 

311 

Negro 

30.55 

6.64 

47 

6.88** 

Education 

Total 

10.68 

1.74 

358 

(in  years) 

White 

IO.52 

1.72 

311 

Negro 

11.72 

1.51 

47 

4 . 51** 

Tenure 

Total 

58.56 

57-24 

355 

(in  months) 

White 

62.07 

59-35 

308 

Negro 

35-57 

32.92 

47 

4.1*8** 

(1)  Total  N  is  less  than  371  due  to  incomplete  data  on  some  subjects. 

(2)  t  ratios  are  between  the  means  of  the  white  and  Negro  samples. 

**  p  <  .01 

There  were  significant  differences  between  the  white  arsd  Negro  samples 
on  all  variables,  the  Negro  officers  being  younger,  having  more  years  of 
formal  education  and  having  been  on  the  Job  for  a  shorter  period  of  time. 


Predictor  Comparisons 


The  California  Test  of  Mental  Maturity  (C1MM)  is  the  sole  predictor 
used  by  the  state  personnel  department  to  select  correctional  officers. 

The  means,  standard  deviations,  and  the  test  of  significance  of  mean  differ 
ences  are  given  in  Table  7*  As  can  be  seen  in  Table  7,  the  white  sample 
scored  significantly  higher  on  the  CTMM  than  the  Negro  sample. 

Table  CTMM  -  Means,  Standard  Deviations, 

N's  and  Test  of  Significance  of  Mean  Differences 

Correctional  Officers 


Group 

X 

s 

N 

CTMM  Total 

78.93 

6.14 

248 

White 

79-33 

6.14 

207 

Negro 

76.91 

5.83 

4l 

2.32* 

(l)  t  ratio 

is  between  the 

means  of 

the  white  and 

Negro  sample 

*  P  <  .05 

Criterion  Comparisons 


The  criteria  used  with  the  correctional  officer  study  were  attendance 
(days  absent),  extension  of  probationary  period,  promotion  and  supervisory 
ratings.  The  attendance  (days  absent  only)  and  extension  of  probationary 
period  criteria  were  identical  to  those  described  in  the  toll  collector 
study . 

The  promotion  criterion  was  controlled  for  tenure  by  partial  correla¬ 
tion  techniques.  This  criterion  measure  was  dichotomously  scored,  a  "0" 
representing  no  promotion  and  a  "l"  representing  a  within-job-claosificntion 
promotion)  i.e.,  an  increase  in  grade  from  level  one  to  level  two  of  the 
Job  classification. 


Table  8 

Criteria 

-  Means, 

Standard 

Deviations, 

N's  and 

Tests  of  Significance  of  Mean  Differences 

Correctional  Officers 

Criterion 

Group 

X 

8 

JL 

Attendance  - 

Total 

1.89 

5-51 

371 

Days 

White 

1.68 

5-1*1 

322 

Absent 

Negro 

3.27 

6.03 

49 

1.88 

Extension  of 

Total 

0.?6 

0.1*3 

355 

Probation 

White 

0.81 

0.39 

308 

Negro 

0.1*3 

0.50 

47 

4.93** 

Promotion 

Total 

1.58 

0.1*9 

368 

White 

1.60 

0.1*9 

319 

Negro 

1.1*5 

O.50 

49 

1.98* 

Rating  by 

Total 

3-1*3 

0.44 

371 

Supervisor 

White 

3-1*5 

0.44 

322 

Negro 

3.31 

0.4l 

49 

2.09# 

(l)  t  ratios  are  between  the  means  of  the  white  and  Negro  samples. 

*  P  <  .05 

**  p  <  .01 


Supervisory  ratings  were  also  obtained  for  the  correctional  officer 
sample.  The  rating  scale  used  was  developed  by  the  investigators.  Recent, 
detailed  job  descriptions  were  available  in  the  state  personnel  department. 
Specific  Job  duty  statements  were  written  for  the  correctional  officer  Job 
classification  on  the  basis  of  the  job  descriptions.  The  distribution  of 
the  rating  scales  to  the  supervisors  was  handled  by  a  member  of  the  person¬ 
nel  department  of  the  state  correctional  department. 

The  supervisor  rated  both  the  importance  of  the  job  duty  to  overall 
job  performance  (on  a  4-point  scale)  and  the  performance  of  each  of  his 
subordinates  on  each  job  duty  (on  a  5-point  scale).  The  final  rating  for 
an  employee  was  obtained  by  summing  the  performance  ratings  on  those  duties 
rated  as  important  and  then  dividing  by  the  number  of  items  rated  important. 

The  means,  standard  deviations,  and  tests  of  significance  of  mean  differ 
ences  for  the  criteria  are  presented  in  Table  8.  It  can  be  seen  that  the 
Negro  sample  scored  signif icantly  lower  than  the  white  sample  on  three  of 
the  criterion  measures,  extension  of  probation,  promotion  and  supervisory 
rating.  No  significant  differences  were  found  on  the  attendance  criterion. 

Validity 

The  correlations  between  the  CTMM  and  the  various  criteria  for  the 
total  group,  white  sample  and  Negro  sample  are  presented  in  Table  9«  A 
perusal  of  Table  9  again  shows  a  general  lack  of  validity  of  the  test  for 
any  of  the  criterion  measures. 

The  only  significant  correlation  for  the  correctional  officer  study 
was  between  the  CTMM  and  the  attendance  criterion  for  the  Negro  sample 
(r  =  -33,  P  <  .0^). 
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Table  9:  Predictor  -  Criterion  Correlations 
Correction!  Officers  ^ 


Criterion 

Group 

C.T.M.M. 

r 

N 

Attendance  - 

Total 

03 

248 

Days 

White 

-02a 

207 

Absent 

Negro 

33* 

41 

(7) 

Extension  of 

Total 

-03 

248 

Probat i on 

White 

-11 

207 

Negro 

01 

41 

Promotion 

Total 

-08 

248 

( Co  nt  ro  lied 

White 

-12 

207 

for  Tenure) 

Negro 

-02 

41 

Rating  by 

Total 

08 

248 

Supervisor 

White 

08 

207 

Negro 

-01 

4l 

(1)  Decimals  are  omitted. 

(2)  Number  in  parentheses  below  the  correlation  for  the  Negro 
sample  indicates  the  model  illustrated  (See  Appendix  A). 

*  P<-05 

a  Different  from  the  Negro  subgroup  correlation  at  the  .0^  level. 
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Models  Illustrated 


The  relationship  between  the  CTMM  and  the  attendance  criterion  for 
the  correctional  officer  model  fits  Model  7  of  the  Bartlett  and  O'Leary 
(1969)  schema  (Figure  7  in  Appendix  A).  There  was  a  significant  difference 
on  the  predictor  between  the  white  and  Negro  samples  but  no  difference  on 
the  criterion  measure.  The  test  was  a  valid  predictor  for  the  Negro  sub¬ 
group  but  lacked  validity  for  both  the  total  sample  and  the  white  subgroup. 
The  correlation  between  the  CTMM  and  the  attendance  criterion  for  the  Negro 
subgroup  (r  =  .33)  was  significantly  different  from  that  for  the  white  sub¬ 
group  (r  =  -.02)  at  the  .O^j  level  (z  2.O5).  Thus,  this  predictor-criterion 
relationship  is  also  illustrative  of  Model  7  when  the  additional  criterion 
of  a  significant  difference  between  the  subgroup  validity  coefficients  is 
imposed. 

The  results  of  the  analyses  of  covariance  for  homogeneity  of  regres¬ 
sion  for  the  correctional  officer  sample  are  presented  in  Table  10.  The 
CTMM  was  found  to  be  biased  for  the  prediction  of  job  performance  as  meas¬ 
ured  by  the  attendance  criterion  if  the  total  group  regression  equation  were 
used.  The  significant  Fg  statistic  revealed  that  common  beta  weight  could 
not  be  used  with  both  subgroups.  This  was  consistent  with  the  results  of 
the  comparison  of  the  validity  estimates  for  the  two  subgroups. 

The  analysis  of  covariance  for  homogeneity  of  regression  also  revealed 
that  the  CTMM  was  biased  for  the  prediction  of  the  extension  of  probation 
criterion.  Although  the  CTMM  had  no  validity  for  the  prediction  of  this 
criterion  (r  =  -.03  for  total  group;  r  =  -.11  for  white  subgroups;  r  =  .01 
for  Negro  subgroup),  a  common  regression  equation  would  underestimate  the 
job  performance  of  the  white  subgroup  but  overestimate  the  performance  of 
the  Negro  subgroup  because  the  white  subgroup  scores  significantly  higher 


Table  10:  Analysis  of  Covariance  for  Homogeneity  of  Regression  - 
Correctional  Officer  Sample 

CTMM 


Criterion 

p(^) 

1 

dfl 

f(2) 

2 

df2 

r(3) 

3 

df- 

i 

Attendance  - 

10.47** 

(2,244) 

9.90** 

(1,244) 

10.6b 

(1,245) 

Days  Abs. 

Extension  of 

20.03** 

(2,244) 

•39 

(1,244) 

39-76** 

(1,245) 

Probation 

Promotion 

•  53 

(2,244) 

•  70 

(1,244) 

•  35 

(1,245) 

Rating  by 

2.06 

(2,244) 

.25 

(1,244) 

3.B8 

(1,245) 

supervisor 

**  p  <  .01 

(1)  F.  tests  hypothesis  that  E  (Y.  |X. , )=a+bX .  for  all  i  groups. 

(2)  tests  hypothesis  that  E  (Y^jX^)  c.j.1  i  groups. 

(3)  F^  tests  hypothesis  that  E  (Y^jx^ )=a+b.jX^j  for  QH  *  groups, 

(valid  test  only  if  is  not  significant). 


than  the  Negro  subgroup  on  both  the  predictor  and  criterion  measures. 


The 


significant  statistic  revealed  that  a  common  intercept  value  could  not 
be  used  for  the  prediction  of  tne  extension  of  probation  criterion  measure 
for  the  two  subgroups. 
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Study  3!  Toll  Facility  Officers 


The  subjects  in  this  investigation  consisted  of  74  toll  facility 
officers  employed  by  a  state  highway  department.  The  sample  included  56 
white  officers  and  18  Negro  officers.  The  major  duties  of  these  toll 
facility  officers  are  maintaining  proper  traffic  flow  and  enforcing  traffic 
regulations  within  the  toll  facility.  Table  11  presents  biographical  infor¬ 
mation  on  these  employees.  The  only  significant  difference  found  between 
the  white  and  Negro  samples  was  that  the  Negro  officers  had  attained  a 
higher  educational  level  than  the  white  officers. 

Table  11:  Biographical  Data  -  Toll  Facility  Officers 


Group 

X 

s 

N(i) 

t(2) 

Age 

Total 

34.49 

7.38 

72 

White 

3^.31 

7.21 

55 

Negro 

35- 06 

6-33 

17 

O.36 

Education 

Total 

11.08 

1.62 

72 

(in  years) 

White 

10.84 

1.58 

55 

Negro 

11.88 

1.50 

17 

2.37* 

Tenure 

Total 

75.80 

49.80 

71 

(in  months) 

White 

72.33 

50.06 

54 

Negro 

86,82 

49.08 

1  ( 

1.03 

(l)  Total  N  may  be  less  than  74  because  of  incomplete  duta  for  some 
subjects. 

(Z )  t  ratios  are  between  the  means  of  the  white  and  Negro  groups. 

* 


P  <  -05 


Predictor  Comparisons 


Two  tests  are  currently  given  to  toll  facility  officer  job  applicants. 
These  are  the  Otis  Quick  Scoring  and  a  ,rerbal  reasoning  test  developed  by 
the  state  personnel  department.  The  verbal  reasoning  test  (called  Booklet 
hereafter)  has  been  recently  added;  therefore,  not  much  data  is  available 
with  respect  to  its  validity.  Table  12  presents  the  raear.u,  standard  devia¬ 
tions  and  tests  of  significance  of  mean  differences  for  the  predictors.  The 
white  officers  scored  significantly  higher  on  the  0ti3  than  the  Negro  officer 
The  intercorrelations  of  two  tests  were  .69  or  the  total  group,  .77  for  the 
white  sample  and  .09  for  the  Negro  sample. 

Table  12:  Predictors  -  Means,  Standard  Deviations,  N's  and  Tests 
of  Significance  of  Mean  Differences 


Toll  Facility 

Officers 

Predictor 

Group 

X 

s 

N 

Otis 

Total 

78.21 

7-68 

71 

White 

79*33 

7.31 

6*4 

Negro 

7*4.67 

7-66 

17 

2 .2*4* 

Booklet 

Total 

78.23 

4.8J+ 

23 

Whits 

78.02 

6-11 

19 

Negro 

79.26 

3.66 

h 

O.M 

*  P  <  -06 

(l)  t-ratios  are  between  the  means  of  the  Negro  and  white  samples. 
Criterion  Comparisons 

The  criterion  measures  used  with  the  toll  facility  officers  were  attend 
ante  (days  absent  and  periods  of  absence),  extension  of  probationary  period, 
promotion  and  supervisory  ratings. 


Table  13 i  Criteria  -  Means,  Standard  Deviations,  N's  and 


Tests  of  Significance  of  Mean  Differences 


Toll  Facility  Officers 


Criterion 

Group 

X 

s 

N 

t^ 

Extension  of 

Total 

0.22 

0.4l 

65 

Probation 

White 

0.22 

0.42 

50 

Negro 

0.20 

0.41 

15 

0.16 

Promotion 

Total 

1.69 

0.46 

72 

White 

1.67 

0.47 

55 

Negro 

1.76 

0.44 

17 

0.b9 

Attendance  - 

Total 

8.06 

10.99 

67 

Days  Abs. 

White 

7-86 

11.88 

51 

Negro 

8.69 

7.78 

16 

0.20 

Attendance  - 

Total 

3.04 

2.01 

67 

Periods  Abs. 

White 

2.73 

2.48 

51 

Negro 

4.06 

2.84 

10 

1.78 

Rating  by 

Total 

3-01 

0.31 

74 

Supervisor 

White 

3.03 

0.44 

56 

Negro 

2.97 

O.lo 

18 

0.85 

(1)  t 

ratios  are 

between  the 

means  of  the 

Negro  and 

white  sample 

The  two  attendance  measures  and  the  extension  of  probationary  period 
criterion  were  defined  and  scored  in  this  study  in  the  same  manner  as 
described  is  Study  1  -  Toll  Collectors.  The  promotion  criterion  and  the 
supervisory  ratings  of  Job  performance  were  defined  and  scored  in  the  3ame 
manner  as  described  in  Study  2  -  Correctional  Officers. 

Table  13  presents  the  means,  standard  deviations  and  tests  of  signifi¬ 
cance  of  mean  differences  for  the  criterion  measures.  There  were  no  signi¬ 
ficant  differences  between  the  white  and  Negro  subgroups  on  any  criterion 
measure. 

Validity 

The  correlations  between  the  predictors  and  criteria  for  the  total  toll 
facility  officer  sample,  the  white  sample  and  the  Negro  sample  are  shown  in 
Table  14.  If  a  predictor-criterion  relationship  fits  one  of  the  models  pro¬ 
posed  by  Bartlett  and  O'Leary  (lyoy),  a  number  indicating  the  appropriate 
reference  figure  in  Appendix  A  is  enclosed  in  parentheses  beneath  the  Negro 
group  correlation. 

The  validity  of  the  "Booklet"  test  wu3  difficult  to  ascertain  because 
of  the  small  sample  to  which  this  test  had  been  given.  The  Otis  Test,  in 
general,  exhibited  low  validity  for  the  criterion  measures.  The  only  signi¬ 
ficant  correlation  for  this  test  was  that  between  the  Otis  Test  and  the  criter¬ 
ion  of  extension  of  probationary  period  for  the  white  subgroup  only. 

Models  Illustrated 

The  relationship  between  the  Otis  Test  and  the  extension  of  probationary 
period  criterion  illustrates  Model  Y  of  the  Bartlett  and  O’Leary  schema  (Fig¬ 
ure  7  in  Appendix  A).  There  was  a  significant  difference  on  the  predictor 
between  the  Negro  and  wnite  subgroups  but  no  difference  on  the  criterion. 


Table  14:  rredlctor  -  Criterion  Correlations 
Toll  Facility  Officers  ^ 

Criterion  Group  ?■:  dictor 

Otis  Booklet 


r 

JL 

r 

JL 

Extension  of 

Total 

18 

6b 

61* 

17 

Probation 

White 

30* 

bo 

63* 

lb 

Negro 

-22 

lb 

..(3) 

2 

(?) 

Promotion 

Total 

-03 

71 

-3* 

23 

White 

1 

O 

b^4 

-3J* 

16 

Negro 

-02 

17 

..(3) 

k 

Attendance  - 

Total 

-07 

bb 

-01 

n 

Days  Abs. 

White 

-03 

bO 

01 

ib 

Negro 

-2b 

lb 

-100 

2 

Attendance  - 

Total 

-03 

6b 

01 

17 

Periods  Abs. 

White 

lU 

bo 

02 

ib 

Negro 

-25 

lb 

-100 

2 

Rating  by 

Total 

Ob 

71 

-3" 

23 

Supervisor 

White 

02 

b*» 

->ih 

ib 

Negro 

12 

17 

(1)  Decimals  are  omitted. 

(2)  Number  in  parentheses  below  the  correlation  for  the  Negro  sample 
indicates  the  model  illustrated  (See  Appendix  A). 

(3)  Nondeterainant  correlation  due  to  zero  variance  in  one  variable. 


The  test  was  a  valid  predictor  for  the  white  subgroup,  but  lacked  validity 
for  both  the  total  group  and  the  Negro  subgroup.  The  correlation  between 
the  Otis  and  the  extension  of  probationary  period  criterion  for  the  white 
subgroup  was  not  significantly  different  from  that  for  the  Negro  subgroup. 
Thus,  this  predictor-criterion  relationship  is  not  illustrative  of  Model  y 
when  the  additional  restraint  of  a  significant  difference  between  the  sub¬ 
group  validity  coefficients  is  imposed. 

The  results  of  the  analyses  of  covariance  for  homogeneity  of  regression 
for  the  toll  facility  officer  sample  are  presented  in  Table  i'j.  The  finding; 
were  consistent  with  the  results  of  the  comparison  of  the  validity  estimates 
for  the  two  subgroups.  No  significant  F-ratios  were  obtained  for  any  of  the 
predictor-criterion  pairs. 


Table  15:  Analysis  of  Covariance  for  Homogeneity  of  Regression 
Toll  Facility  Officer  Sample  (l) 

Criterion  Otis  Test 


p(2) 

* 1 

dfl 

f(3) 

F2 

df2 

.(<>) 

P3 

df3 

Extension  of 

1.64 

(2,61) 

3.22 

(1,61) 

.06 

(1,62) 

probation 

Promotion 

.24 

(2,67) 

.00 

(1,67) 

.49 

(1,68) 

Attendance- -Days 

.12 

(2,61) 

.23 

(1,61) 

.02 

(1,62) 

absent 

Attendance--  2.56  (2,6l)  I.96  (l,6l)  3.12  (1,62) 

periods  absent 

Rating  by  .12  (2,67)  .°0  (1,67)  .24  (1,68) 

supervisor 

(1)  The  analysis  of  covariance  for  homogeneity  of  regression  was  not  con¬ 
ducted  using  the  Booklet  Test  as  the  predictor  variable  due  to  the 
extremely  small  sample  sizes. 

(2)  tests  hypothesis  that  E  (Yjjl  ^ij)-a»bX^j  for  all  i  groups. 

(3)  ^2  te3ts  hypothesis  that  E  (Y-yjX^ , )=ai *bX.y  for  all  i  groups. 

(*♦)  F,  tests  hypothesis  that  E  (Y,  ,1 X.  .  )=a+b  .X,  .  for  all  i  groups, 

3  ij  i  aj 

(valid  test  only  if  Fp  is  not  significant. 


Study  It:  Federal  Correctional  Institution  -  Inmate  Population 


Sample 

Study  It  consisted  of  155  inmates  of  a  Federal  Correctional  Institution. 
Education  files  of  all  inmates  were  searched  and  a  sample  of  119  white  and 
36  Negro  subjects  was  obtained.  Table  16  presents  background  data  on  the 
inmates . 


Table  l6: 

Biographical  Data 

-  Federal 

Correctional  Institution 

Group 

I 

s_ 

N 

t(1) 

Age 

Total 

21.07 

1.95 

155 

White 

21.02 

2.07 

119 

Negro 

20.61 

1.68 

36 

1.08 

Education 

Total 

8.1*8 

1.80 

155 

(years) 

White 

8.1*3 

1.7b 

119 

Negro 

8.61* 

2.02 

36 

.61 

(1)  t  ratios  are  between  the  white  and  Negro  samples 

Inspection  of  the  above  table  reveals  that  the  average  inmate  age  was 
approximately  21  years,  and  the  average  educational  level  (highest  grade 
completed)  was  8.5.  There  were  no  significant  differences  in  age  or  edu¬ 
cational  level  between  the  white  and  Negro  samples. 

Predictor  Comparisons 

Scores  on  the  Revised  Beta  Examination,  administered  to  all  inmates, 
were  recorded  from  inmate  files.  The  Revised  Beta  is  a  non-verbal  intelli¬ 
gence  test  commonly  used  in  penal  institutions.  Table  1?  presents  mean 
scores  for  white  and  Negro  subjects.  Whites  scored  significantly  higher 
than  Negroes  on  this  test,  even  though  the  Beta  is  a  non-verbal  test. 

This  finding  is  consistent  with  Tenopyr's  (1967)  assertion  that  non-verbal 
tests  d  .  n~t  necessarily  reduce  mean  differences  between  white  and  Negro 
subjects . 


hh 


Table  17:  Predictor  Means,  Standard  Deviations,  N's,  and 
Tests  of  Significance  of  Mean  Differences  - 
Federal  Correctional  Institution 


Predictor  Group 

X 

£ 

N 

t(1) 

Beta  IQ  Total 

100.63 

13.00 

155 

White 

103.60 

11.66 

119 

Negro 

90.81 

12.50 

36 

5.63** 

(l)  t  ratios  are  between  the  means  of  the  white  and  Negro  samples 

#*  p  <.01 

Criterion  Comparisons 

Two  measures  of  educational  performance  were  obtained.  The  first  was 
a  monthly  raring  of  the  inmates' classroom  performance.  Inmates  were  rated 
by  their  instructors  using  a  four  point  scale  on  the  following:  (1)  Class¬ 
room  Participation,  (2)  Utilization  of  Class  Time,  (3)  Interest  and  Initia¬ 
tive,  (it)  Academic  Aptitude,  and  (5)  Achievement.  A  subject's  final  rating 
was  the  average  of  his  ratings  on  these  five  traits.  At  least  two  monthly 
ratings  were  required  for  a  case  to  be  included  in  the  sample. 


Table  18:  Criterion  Means,  Standard  Deviation.:,  N's,  and 
Tests  of  Significance  of  Mean  Differences  - 
Federal  Correctional  Institution 


Criterion 

Group 

X 

1 

i 

Ratings 

Total 

2.97 

.55 

115 

White 

2.96 

.60 

87 

Negro 

3.01 

.1*3 

28 

.1)0 

Change  Score 

Total 

.00 

.7*' 

1  30 

(SAT) 

White 

.10 

•  7'* 

99 

Negro 

-.1*4 

.71* 

31 

!  .  33 

(l)  t  ratios  are  between  the  means  of  the  white  and  Negro  samples 

As  shown  in  Tab^e  18  white  and  Negro  subjects  were  found  to  be  approx¬ 
imately  equal  in  terms  of  mean  criterion  performance  based  on  the  ratings. 


The  second  criterion  measure  obtained  was  a  residual  gain  score  (Manning 
and  DuBois  ,1962 )  based  on  changes  in  Stanford  Achievement  Test  scores  before 
and  after  the  inma  es  were  exposed  to  educational  classes.  The  average  time 
between  testings  was  approximately  three  months.  Discussions  with  instructors 
in  the  educational  department  indicated  tfteir  preference  for  a  gain  score  as 


criterion  measure.  However,  they  also  point  out  that  u  general  increase  in  to 
scores  can  be  expected  due  to  general  adjustment  of  inmates  to  a  confined 
environment. 

Table  18  presents  the  means,  standard  deviations,  and  tests  of  sig¬ 
nificance  between  means  on  the  Stanford  Achievement  Test  change  scores. 

No  significant  differences  were  found  between  the  two  groups. 

Validity 

Correlations  between  the  Revised  Beta  and  the  criterion  measures 
are  presented  in  Table  19.  The  Revised  Beta  correlated  significantly  with 
the  rating  criterion  for  the  white  and  Negro  subgroups.  However,  since  the 
relationship  was  in  the  opposite  direction  for  the  two  clliri id  groups,  the 
correlation  for  the  total  group  was  not  significant.  The  correlation  between 
the  Beta  and  Luc  change  score  criterion  was  significant  for  the  total  sample 
and  the  white  subgroup  but  not  significant  for  the  Negro  subgroup. 


Table  19:  Predictor  -  Criterion  Correlations 


Federal 

Correctional 

,  . *  •  0 ,?) 
Institution 

Predictor 

Beta  IQ 

Criterion 

Group 

N_ 

Ratings 

Total 

J  >i 

119 

White 

11**.-. 

»Y 

Negro 

-)|P* 

(10) 

Change  Score 

Total 

1  so 

(SAT) 

White 

')') 

Negro 

09 

11 

(Y) 

(l)  Decimals  are  omitted. 

(?)  Number  in  parentheses  below  the  correlation  for  the 
Negro  sample  indicates  the  model  illustrated  (Dee 
Appendix  A) 

*  o<.05 

**p<.  01 

a  indicates  those  models  in  which  a  significant  diff¬ 
erence  exists  between  the  validity  coefficients  for 
the  two  ethnic  groups. 

Models  Illustrated 

Viewing  the  data  in  terms  of  tee  models  presented  by  Bartlett  and 
u' Leary  (1969)  reveals  that  Model  10  was  demonstrated.  (See  appropriate 
r-.ferenoe  Figure  in  Appendix  A).  The  correlation  between  the  predictor 


and  rating  criterion  was  positive  for  the  white  inmates  but  negative  for 
the  Negro  inmates.  Moreover,  combining  the  two  groups  eliminated  the 
validity  of  the  Revised  Beta  as  a  predictor  of  the  ratings.  Thus,  unless  the 
scores  were  moderated  on  the  basis  of  race  no  linear  prediction  of  the 
rating  criterion  would  be  possible.  This  is  a  situation,  however,  where 
non-linear  prediction  would  yield  validity. 

The  relationship  between  the  Revised  Beta  and  the  change  score  criterion 
illustrated  Model  7*  Although  the  test  is  appropriate  as  a  predictor  for  the 
white  sample  it  is  inappropriate  for  the  Negro  sample.  If  the  test  wore  used 
as  a  selection  device  the  result  would  be  the  rejection  of  qualified  Negroes. 

Only  the  example  of  Model  10  met  the  uduitional  criterion  of  a  signifi¬ 
cant  difference  between  validity  coefficients,  as  indicated  by  Lite  super¬ 
script  a  in  Table  19. 

It  is  important  to  note  that  motivation  of  inmates,  in  the  test-taking 
situation  is  indeed  a  problem.  Discussions  witli  ins true  tors  raised  ques¬ 
tions  concerning  the  reliability  of  the  measures.  Thus,  the  above  data 
must  be  interpreted  with  extreme  caution. 


Table  20:  Analysis  of  Covariance  for  Homogeneity  of 
Regression  -  Federal  Correctional  Institution 


Ratings 


Change  Rcore 


F 


2 


F 


3 


Beta 

IQ 


6 . 6k»* 

df (2,111) 


10.92** 

(1,111  ) 


2.17 

(1,112) 


.66  .02  .'{1 

(2,126)  ( i  ,120)  (1,127  ) 


(1) 

(2) 

(3) 


F,  tests  hypothesis  that  E(Y..jX.,)  = 

1  ij*  ij 

F  tests  hypothesis  that  E(Y../X..)  = 

2  ij'  i.] 

F  tests  hypothesis  that  K(Y.  |X  )  = 
o  r  j  r  J 

p  <  .01 


a  +  bX. .  for  all  i 
ij 

a.  +  bX . .  for  ail 
ij 

a.  +  b . X ,  .  for  al ! 

l  l  ij 


group:: . 

L  groups, 
i  groups . 


Table  20  presents  the  results  of  the  regression  tests  for  the  analysis 
of  covariance  (Pottnoff,  I960).  The  significant  F1  ratio  in  the  relation¬ 
ship  between  the  Beta  IQ  and  the  rating  criterion  indicates  that  hiac  in 


present.  The  significant  ratio  indicates  that  the  difference  in  regres 
sion  slopes  is  the  major  factor  contributing  to  this  bias. 

All  of  the  F  ratios  in  the  relationship  between  the  Beta  IQ  end 
change  scores  were  not  significant,  indicating  that  no  bias  was  present. 


Study 


Home  Office  Clerical 


Sample 

A  representative  sample  of  clerical  employees  in  the  home  office 
of  a  large  industrial  organization  comprised  the  subject  population  of 
Study  5*  Selecting  one  out  of  every  five  employees  yielded  a  sample  of 
409  subjects  of  whom  363  were  white  and  46  Negro.  Table  21  presents 
background  characteristics  for'  the  total,  white  and  Negro  samples. 
Inspection  of  Table  21  reveals  that  thb  Negro  sample  is  older  and  has 
been  with  the  firm  for  a  shorter  period  of  time  than  the  white  sample. 


Table  21: 

Biographical  Data--]Jome  Office 

Clor ioal 

Group 

£ 

„(U 

t(2} 

Age 

Total 

26.2k 

10.62 

405 

White 

26.72 

11.02 

359 

Negro 

28.85 

6.0k 

46 

3. 56** 

Tenure 

Total 

3.89 

3.U2 

405 

(years) 

White 

4  -15 

3.59 

359 

Negro 

1.89 

1.37 

46 

8.13#* 

(1)  Total  N  is  less  than  409  because  of  incomplete  data  for  some  subjects 

(2)  t  ratios  are  between  the  means  of  the  white  and  Negro  samples 
*»p<.Gl 

Predictor  Comparisons 

The  major  purpose  of  this  validation  study  was  to  determine  the 
relative  utility  of  a  new  version  of  the  Thurstone  Test  of  Mental  Alert¬ 
ness  (TMA),  as  compared  to  the  original  TMA  administered  at  the  time  of 
employment . 

In  addition  to  the  original  and  new  TMA ,  a  company-developed 
nonverbal  test  of  reasoning  ability  (The  Picture  Selection  Index)  vuu 
administered  to  the  employees.  Since  this  test  was  in  its  early  develop¬ 
ment  three  time  limits  were  examined- -10,  is,  and  20  minutes. 


ho 


■zr.  J£ 


Table  22  presents  means,  standard  deviations  and  tests  of  signi¬ 
ficance  between  means  for  the  white  and  Negro  samples.  No  significant 
differences  were  found  between  racial  groups  on  the  original  TMA.  How¬ 
ever,  the  white  sample  scored  significantly  higher  than  the  Negro 
sample  on  the  new  version  of  the  TMA.  The  firm’s  psychologists  indicate 
that  tin  s  difference  may  be  due  to  the  increased  verbal  content  of  thn 
new  version. 

The  mean  performance  of  the  twc  racial  groups  on  the  Picture 
Selection  Index  was  approximately  equal.  Moreover,  increasing  the 
time  limits  did  not  produce  any  mean  differences  between  tne  two  groups. 

Criterion  Comparisons 

Employees  were  rated  by  both  their  Immediate  Supervisor  and  Office 
Manager  on  the  following  dimensions,  using  a  nine  point  rating  scale: 

(1)  Quickness  in  Understanding  New  Material,  (2)  Aecurucy,  {3)  Numerical 
Ability,  (M  /erbal  Ability,  {5)  Judgment — the  ability  to  make  appropriate 
and  eound  decisions,  and  (6.'  Overall  Mental  Alertness.  In  addition,  employees 
were  rated  on  an  eight  point  scale  on  their  "General  Promotability" —  a 
rating  of  the  employee's  potential  top  performance  level. 

The  correlations  between  tho  Immediate  Supervisor  rating:;  and 


Office  Manager  ratings 

were : 

- 

Quickness 

.56 

Verbal  Ability 

Accuracy 

.  !>8 

Judgment 

-Iff 

Numerical  Ability 

.50 

Mental  Alertness 

.y> 

Promotional  Potential 


fee  an  sc  >,£  the  rather  low  intercorrelations  between  tile  two  sets  of 
ratings,  they  were  not  combined  into  an  overall  rating  of  joL 
performance.  Bather,  each  rating  was  considered  separately .  It  should 
be  noted  that  a  general  hulo  factor  was  present  in  both  samples. 

Criterion  means  for  the  total  group,  whites,  and  Negroes  are 
presented  in  Table  23.  In  general,  the  Negro's  job  performance  is  rated 
as  being  lower  than  the  job  performance  of  whites.  A  significant  dif¬ 
ference  was  found  between  the  mean  job  performance  ratings  for  the  two 
racial  groups  on  11  out  of  the-  14  -possible  rating  criteria. 


TSbie  22;  Predictor  Keans,  Standard  Deviations, 

S*s  and  Teste  or  Significance  of  Mean  Differenced 
'  -  ~  Homo  Office  Clerical 


Predictor 

Jrout; 

T 

1} 

N 

i(1) 

Original  TMA 

Total 

33.08 

IO.74 

40? 

Verbal 

__  White 

5i.3? 

10.88 

363 

Negro 

31.52 

9.88 

46 

1.08 

Quantitative 

Total 

23.67 

8. 14 

409 

' 

White 

2jt.ee 

8,40 

363 

Negro 

23.37 

6.70 

46 

.22 

Total  Score 

Total 

56.87 

3?.29 

409 

White 

58.25 

27.19 

363 

Negro 

54.83 

15.00 

46 

3.29 

New  TMA 

- 

Verbal 

Total 

46.60 

15-96 

40? 

White 

47.32 

16.02 

363 

Negro 

41.22 

13.35 

46 

2.47* 

Quantitative 

Total 

23.10 

7.6l 

409 

“ 

White 

23.60 

8.43 

363 

Negro 

20.70 

6.68 

46 

2.24* 

Total  Score 

To  tad 

69  69 

20.57 

409 

White 

72.02 

32.25 

563 

Negro 

61.91 

16.59 

46 

3.37** 

Picture  Selection 

Index 

lo  -min .  Time  Limit 

To  ta  l 

36.14 

8.8? 

355 

White 

36.54 

9-59 

338 

Nog  ro 

54.19 

7.20 

37 

J  .44 

15-min.  Time  I.lmU 

Tola  i 

48.09 

9.56 

354 

- 

Wh  1 !.<; 

48.43 

30.  lit 

338 

- 

Nog  i*o 

46.38 

6.72 

37 

t  -i>3 

20-r.in.  Time  Limit 

Total 

54.65 

9.18 

355 

White 

54.57 

9-75 

318 

Negro 

54.00 

6.63 

37 

.46 

(1) 

t  ratios  are  between  means 

*-f  wh '  to  and  Negro 

samples 

*p<.05 

*P<.C1 
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Table  23:  Criteria -Means,  Standard  Deviations, 

K«s  and  Tests  of  Significance  of  Mean  Differences 


Criterion 

Homo 

Group 

Office  Clerical 

X 

s 

N 

t(1) 

Quickness 

Office- 

Total 

5.84 

1.42 

371 

Manager 

White 

5.92 

1.41 

328 

Negro 

5.26 

J.45 

43 

2.88** 

immediate 

Total 

6.08 

1.43 

315 

Supervisor 

White- 

6.13 

1 .41 

284 

Negro 

5.58 

1.50 

31 

2.04* 

Accuracy 

Office 

Total 

6.07 

1.40 

352 

Manager 

White 

6.15 

1.39 

309 

Negro 

5.49 

1.33 

43 

2 . 93** 

Immediate 

Total 

6.03 

1.5? 

315 

Supervisor 

White 

6.09 

1.52 

284 

Negro 

5.45 

1.41 

<3 

2.24* 

Numerical  Abll ltv 

Office 

Tola  1 

5.74 

1.50 

317 

Manager 

Wh  1  to 

5.81 

1 . 47 

2(to 

Nog  ro 

5  27 

1 .63 

57 

2.07s 

Immediate 

Total 

5.81 

1.39 

296 

Supervisor 

White 

5.8? 

1.4] 

268 

Negro 

5-32 

1.12 

28 

1.99* 

Verbal  Ability 

Office 

Total 

5-59 

1.41 

37' 

Manager 

•  i  to 

5.66 

1.41 

328 

Negro 

5.02 

1.30 

43 

2.82** 

Immediate 

Total 

5.68 

1.39 

319 

Supervisor 

W:,-  •<> 

5.73 

1.36 

284 

- 

Negro 

•7.23 

1.59 

31 

1.91 

Judgment 

Office 

Total 

5.84 

1,49 

350 

Manager 

While 

5.90 

1.48 

308 

Negro 

5.43 

2.56 

42 

1 . 92 

5? 


Tabie  23  { cones'.) , 


Criterion 

Group 

X 

2 

N 

{n> 

Immediate 

Total 

5.77 

3.53 

3Vl 

Supervisor 

White 

5.82 

3.53 

284 

Overall  Mental 
Ability 

Meg.  o 

5.37 

1. 45 

JO 

1.54 

Office 

Total 

5-37 

1.4? 

370 

Manager 

White 

5.95 

1.45 

327 

- 

Negro 

5-33 

1.54 

43 

2.62* 

Immediate 

Total 

6.09 

1.41 

315 

Supervisor 

White 

6.36 

3-39 

284 

Promotion  Potential 

N'jgi‘0 

5-39 

1.41 

31 

2 . 92*» 

Office 

Total 

4.71 

1.42 

358 

Manager 

White 

4.79 

1.59 

318 

Negro 

4.05 

1.57 

40 

3 . 32** 

Immediate 

Total 

4.54 

1.45 

303 

Supervisor 

White 

4 .61 

1.45 

273 

Negro 

3.37 

1.33 

30 

2. 66** 

t  ratios  are  between  means  of  white  and  Negro  samples 

*p<.05 

**p<.01 
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Since  tire  Negro  sample  had  been  wi th  else  firm  for  a  shorter  period 
than  the  white  sample,  correlations  between  tenure  and  the  rating  criteria 
were  computed.  The  results  indicated  that  job  experience  was  not  a  major 
factor  contributing  to  the  obtained  criterion  differences  for  the  two 
racial  groups.  The  only  significant  relationship  was  between  tenure  and 
ratings  by  office  Managers  on  Numerical  Ability  for  the  Negro  sample 
(r=.3 6). 

■Validity 

Correlations  between  the  various  predictors  and  criteria  are  presented 
in  Table  24.  In  general,  ratings  by  Office  Managers  were  more  predictable 
than  ratings  by  Immediate  Supervisors  for  both  racial  groups. 

Considering  both  the  original  and  new  TMA,  we  find  that  rutings  of 
Verbal  Ability  and  Mental  Alertness  by  Office  Managers  are  equally  predictable 
for  both  racial  ,s.  Moreover,  with  the  exception  of  the  quantitative 

score,  the  new  TH/--  predicts  Office  Manager  ratings  of  Numerical  Ability  and 
Promotion  Potential  for  both  racial  groups  equally  well. 

With  few  exceptions,  ratings  by  Immediate  Supervisors  are  predicted 
by  both  the  original  and  new  TMA  for  the  white  sample  but  are  predictable 
in  only  two  cases  for  the  Negro  sample. 

Increasing  the  time  limit  from  ten  to  fifteen  minutes  tends  to  increase 
the  validity  of  the  Picture  Selection  Index  for  both  racial  groups.  A 
further  increase  in  the  time  limit  from  fifteen  to  twenty  minutes  tends  to 
yield  a  slight  increase  in  validity  for  the  white  sample,  but  in  some  in¬ 
stances  results  in  a  decrease  in  validity  for  the  Negro  sample. 

In  general,  the  Picture  Selection  Index  is  not  as  valid  as  the  original 
and  new  TMA.  This  finding  is  consistent  with  studies  in  the  literature 
which  report  that  nonverbal  tests  are  not  as  valid  as  verbal  tests. 

Models  Illustrated 

The  criteria  used  for  identifying  models  was  whether  the  correlation 
between  a  test  and  criterion  was  significantly  greater  than  zero  in  neither, 
both,  or  one  of  the  subgroups.  It  i;;  important  to  note  that  In  a  number 
of  comparisons  in  Table  24,  the  absolute  magnitude  of  the  correlation  for 
the  Negro  sample  is  larger  than  the  corresponding  correlation  for  the 
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white  sample,  but  the  correlation  is  not  significant  in  the  Negro  sample 
due  to  a  relatively  smally  sample  size. 

Considering  the  new  TMA,  ten  examples  of  Model  1  emerged.  The 
number  in  parentheses  below  the  correlations  for  the  Negro  sample  in  Table 
2b  indicates  the  model  represented  (See  appropriate  reference  figure  in 
Appendix  A).  White  employees  obtained  higher  mean  scores  on  both  the 
predictor  and  criterion  in  this  situation,  but  the  validity  coefficients 
were  approximately  equal  for  the  two  racial  groups.  If  it  can  be  assumed 
that  the  rating  criterion  is  unbiased,  then  discrimination  on  the  test 
does  not  constitute  unfair  discrimination,  since  the  test  reflects  a  real 
difference  in  predicted  performance. 

Model  2  illustrates  the  situation  in  which  mean  differences  exist 
on  predictor  performance  for  the  two  racial  groups  but  no  difference  is 
present  in  the  mean  criterion  performance  for  the  two  groups.  Also, 
the  correlation  between  the  predictor  and  criterion  is  significant  for 
both  groups.  This  model,  which  was  illustrated  in  the  relationship 
between  the  new  TMA  and  ratings  of  Verbal  Ability  and  Judgment,  occurred 
three  times. 

Model  'l  occurred  l6  times.  In  this  model,  the  validity  coefficients 
are  approxi"ately  equal  for  the  two  groups.  In  addition,  there  are  no  dif¬ 
ferences  in  the  mean  predictor  scores  but  significant  differences  between 
racial  groups  on  the  criterion.  If  the  tests  were  validated  only  on  the 
total  group,  the  result  would  be  an  underprediction  of  performance  for  the 
white  sample  and  an  overprediction  for  the  Negro  sample.  Differential 
prediction  would  yield  more  accurate  prediction  for  both  groups. 

Model  5  is  illustrated  in  the  relationship  between  the  Picture  Selection 
Index  and  ratings  of  Judg«?“!it  by  Immediate  Supervisors.  Negro  and  white 
employees  perform  approximately  equal  on  both  the  predictor  and  criterion, 
but  the  tes\  is  valid  only  for  the  white  sample.  The  frequency  of  this 
model  was  10. 

Forty-- .hree  cases  on  Model  6,  as  illustrated  in  many  of  the  relation¬ 
ships  between  the  Picture  Selection  Index  and  the  various  rating  criteria, 
and  in  some  original  TMA-eriterion  relationships,  were  found.  In  this 
model  the  two  groups  differ  in  mean  performance  on  the  criterion  as  well 

cq 


as  validity ,  but  there  is  no  difference  in  the  predictor  performance 
for  the  two  racial  groups.  If  this  test  were  used  in  selection,  the 
result  would  be  to  select  only  those  white  individuals  with  a  high 
probability  of  success  on  the  Job,  but  to  select  Negro  individuals  whose 
probability  of  success  on  the  Job  is  not  known. 

The  relationship  between  the  new  TMA  and  ratings  of  Judgement  illustrates 
Model  7.  White  employees  score  significantly  higher  than  Negroes  on  the 
predictor,  but  mean  criterion  ratings  were  approximately  equal.  However,  the 
test  was  valid  only  for  the  white  sample.  This  model  occurred  five  times. 

Twenty-two  examples  of  Model  8  were  illustrated  in  the  relationships 
between  scores  on  the  new  TMA  and  the  various  rating  criteria.  White 
employees  scored  higher  than  Negro  employees  on  both  the  predictor  and 
criterion  measures,  but  the  test  was  valid  only  for  the  white  sample.  One 
can  make  valid  predictions  using  a  combined  group  validation  procedure  even 
though  the  test  is  not  valid  for  the  Negro  group,  since  the  test  identifies 
the  lower  performing  group  of  Negroes.  However,  it  is  inappropriate  to  use 
the  test  to  select  Negroes. 

Model  11  ,  the  final  model  illustrated  in  thi3  sample,  represents 
the  situation  in  which  a  test  is  valid  for  both  racial  groups  combined 
but  has  no  validity  for  each  subgroup  separately.  This  model  is  illustrated 
in  the  relationship  between  the  quantitative  section  of  the  new  TMA  and 
ratings  of  Accuracy  by  Immediate  Supervisors. 

As  indicated  above,  the  criterion  used  for  identifying  the  above 
models  was  whether  the  correlation  between  a  test  and  criterion  was  sig¬ 
nificantly  greater  than  zero  in  neither,  both,  or  one  of  the  subgroups. 

An  additional  criterion  can  be  applied  to  Models  5  through  10 — thut  a  sig¬ 
nificant  difference  must  exist  between  the  validity  coefficients  for  the  two 
racial  groups.  Applying  this  somewhat  more  restrictive  criterion  completely 
eliminates  the  Model  5,  6,  7,  and  8  examples. 

The  analysis  of  covariance  for  homogeneity  of  regression  (Pottoff, 

1966)  yields  results  which  are  consistent  with  the  more  restrictive  definition 
of  bias . 

All  of  the  Fg  ratios  were  not  significant,  indicating  that  a  common 
regression  slope  was  appropriate  for  both  racial  groups.  Table  25  presents 
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the  results  of  this  analysis.  The  original  TMA  demonstrated  the  most  bias 
using  this  method  of  analysis  as  indicated  by  the  frequently  significant 
Fg  ratios.  A  significant  ratio  means  that  a  common  intercept  cannot 
be  used  for  the  two  racial  groups. 

It  should  be  noted  that  comparing  only  mean  test  performance  one 
would  conclude  that  the  original  TMA  was  less  biased  than  the  new  TMA 
since  white  employees  score  higher  than  Negro  employees  on  the  new  TMA. 
However,  considering  both  test  and  criterion  performance,  as  well  as  the 
relationship  between  them,  one  concludes  that  the  original  TMA  is  more 
biased  than  the  new  TMA  in  this  particular  sample. 
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1,2?S)  (1,299)  (2,266)  (1,266)  (1,267)  (2,267)  (1,267) 


Numerical  Ability  Verbal  Ability  Verbal  Ability  Judgement  Judgement 

Predictor  Xwa.  Sup, _  Off.  Mgr.  Imm.  Sup.  . _  Off.  Mgr. _  _ Imm.  Sup. 


(2,250)  (1,250)  (1,251)  (2,315)  U.315)  (1,316)  (2,266)  (1,266,  (1,267)  (2,296)  (l,2Sb)  (1,297)  (2,265)  (1,265)  (1,266) 
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Study  6:  Catalog  Order  Plants 


Sample 

Study  6  consisted  of  G J 0  employees  of  a  large  retail  organization  of  whom 
472  were  white,  287  Negro,  and  51  Latin  American.  All  jobs  were  essentially 
clerical  in  nature  and  most  required  some  arithmetic  skills.  The  sample  has 
been  broken  down  into  specific  job  classifications  wherever  feasible. 

Predic-torB 

The  same  predictors  were  used  for  ell  job  classifications.  Two  experimen¬ 
tal  clerical  tests,  developed  by  the  firm's  psychologists  were  administered  to 
all  employees.  Clerical  I  consists  of  two  columns  of  names  and  numbers  and  the 
task  of  the  subject  is  to  determine  whether  each  is  alike  or  different.  Clerical 
IT  is  a  number  cancel] ation  task  in  which  the  subject  is  required  to  strike  out 
al]  numbers  in  a  column  that  are  the  same  as  the  underlined  number  at  the  top 
of  the  column.  Since  these  pests  were  experimental  in  nature  two  time  limits 
were  examined — 5  minutes  and  10  minutes.  Also  each  test  was  scored  in  two 
ways:  (l)  Number  Correct  and  (2)  Number  Correct  minus  Number  Wrong. 

In  addition  to  the  two  experimental  tests,  scores  on  a  company  developed 
Arithmetic  Reasoning  test  and  a  Verbal  Reasoning  test  were  obtained  for  all 
employees  in  the  sample. 

Criteria 

Ratings  by  supervisors  were  obtained  for  all  employees.  The  rating 
instrument  was  a  seven  point  scale  developed  by  the  firm’s  psychologists 
covering  the  following  dimensions: 

(a  )  Accuracy:  The  ability  to  work  without  making  errors. 

(2)  Accuracy  under  Pressure:  The  ability  to  turn  in  accurate  work 
under  differing  conditions  of  pressure. 

(3)  Work  Speed:  The  pace  at  which  a  person  works. 

(L)  Learning  Ability:  The  ability  to  understand  directions  and  learn 
from  the  directions  provided. 

(5)  Human  Relations:  The  ability  to  maintain  good  relations  with  others. 

(6)  General  Overall  Effectiveness. 


Background  Data  -  Merchandise  Handlers  I 


Table  26  presents  the  biographical  data  obtained  for  this  job  classifi¬ 
cation.  A  number  of  employees  of  Latin  American  extraction  were  employed  in 
this  job  classification  in  addition  to  the  Negro  minority.  Each  minority 
group  was  compared  separately  with  the  white  sample. 

Table  26;  Biographical  Data-Merchandise  Handlers  I 


Group 

s 

N 

t 

Age 

Total 

30.79 

11.43 

190 

White 

35-52 

13.45 

86 

Negro 

26.61 

7.18 

84 

5.37**^ 

Latin 

28.00 

8.83 

20 

3. Cl#*  2 

Tenure 

Total 

2.35 

1.15 

190 

(Years ) 

White 

2.92 

1.19 

86 

Negro 

1.80 

.86 

84 

7.00** 

Latin 

2.20 

.83 

20 

2.54# 

Education 

Total 

10.72 

1.81 

190 

(Years) 

White 

9.95 

1.84 

86 

Negro 

11.58 

1.22 

84 

6.78## 

Latin 

10.30 

2.18 

20 

-74 

(1>  t 

ratios  are  between 

the  means 

of  the  white 

and 

Negro  samples. 

(2)  t 

ratios  are  between  the  means 

of  the  white 

and 

Latin  samples. 

*p<.  05 

**p.£01 

Negro  and  Latin  employees ,  as  compared  to  their  white  counterparts 
are  younger  and  have  been  with  the  firm  for  a  shorter  period  of  time.  The 
educational  level  of  the  Negro  employees  is  significantly  higher  than  that 
of  the  white  employees .  However ,  the  educational  level  of  the  white  and 
Latin  employees  is  approximately  equal . 
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Predictor  Comparisons 


Mean  predictor  scores  for  the  total,  white,  Negro,  and  Latin  samples  are 
presented  in  Table  27 .  White  employees  score  significantly  higher  than  either 
the  Negro  or  the  Latin  sample  on  the  Verbal  Reasoning  Test.  There  were  no 
significant  differences  between  the  performance  of  the  two  minority  groups  and 
the  white  sample  on  any  of  the  other  predictors. 

It  should  be  noted  that,  although  the  mean  differences  between  each 
minority  group  and  the  white  sample  were  not  significant,  a  rather  consistent 
ranking  pattern  emerged  across  all  predictors:  white  employees  scored  higher 
than  Negro  employees  who,  in  turn,  scored  higher  than  Latin  employees. 

Criterion  Comparisons 

As  indicated  in  Table  28,  there  were  no  differences  in  the  Job  performance 
of  the  three  ethnic  groups  as  measured  by  supervisory  ratings. 

Validity 

Table  29  presents  validity  coefficients  for  the  total,  white  and  Negro 
samples.  Since  a  significant  relationship  was  found  to  exist  between  tenure 
and  the  various  criteria,  correlations  have  been  controlled  for  tenure  where 
appropriate.  The  clerical  tests  appear  equally  valid  across  all  criteria. 

This  generalization  holds  regardless  of  the  time  limit  imposed  or  the  utiliza¬ 
tion  of  a  correction-for-guessing  formula. 

All  forms  of  Clerical  Tests  I  and  II  were  valid  predictors  of  the  six 
rating  criteria.  Moreover,  with  few  exceptions,  th'-'  validity  coefficients 
were  approximately  equal  for  the  white  and  Negro  sables.  Validities  for 
both  the  Verbal  Reasoning  and  the  Arithmetic  Reasoning  Tests  tended  to  be 
lower  than  those  of  Clerical  Tests  I  and  II. 

Predictor-criterion  correlations  for  the  total,  white,  and  Latin  samples 
are  presented  in  Table  30.  Inspection  of  the  table  reveals  that  even  though 
the  absolute  magnitude  of  the  correlations  for  the  Latin  sample  are  relatively 
high,  sometimes  exceeding  those  for  the  white  sample,  only  a  few  are  statis¬ 
tically  significant.  Clerical  Test  I  predicts  more  criteria  for  the  Latin 
sample  than  any  of  the  other  predictors. 
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Table  27:  Fredictors-Means,  Standard  Deviations, 
N's,  and  Testa  of  Significance  of  Mean  Differences 
Merchandise  Handlers  I 


Predictor 

Croup 

X 

s. 

N 

t 

Verba] 

Total 

18.04 

8.72 

190 

Reasoning 

White 

20.29 

9 .66 

86 

2  36*(]) 
0  (2) 
4.94##'  1 

Hegro 

17-15 

7.44 

84 

Latin 

12.05 

5.66 

20 

Arithmetic 

Total 

20.71 

7.37 

190 

Reasoning 

White 

21.48 

7.97 

86 

Negro 

20.40 

6.90 

84 

.94 

Latin 

18.65 

6.40 

20 

1.47 

Clerical  I 

Total 

42.97 

13.4? 

190 

5  minutes 

White 

44.20 

14.69 

86 

Negro 

42.55 

12.51 

84 

•  78 

Latin 

39.50 

11.68 

20 

3.33 

Clerical  I 

Total 

89.02 

26.77 

190 

10  minutes 

White 

91.58 

28.18 

86 

Negro 

87.55 

25.71 

84 

.97 

Latin 

84,20 

24.95 

20 

1.07 

Clerical  II 

Total 

54.89 

12.98 

190 

5  minutes 

White 

55.79 

14.50 

86 

Negro 

54.07 

11.51 

84 

.85 

Latin 

54.50 

12.21 

20 

.37 

Clerical  II 

Total 

107.21 

22.84 

190 

10  minutes 

White 

109.85 

23  83 

86 

Negro 

105.10 

22.23 

84 

1.33 

Latin 

104.85 

20.89 

20 

.86 

Clerical  I 

Total 

35.96 

lC.03 

J.90 

(R-W) 

White 

37-24 

3  7.43 

86 

5  minutes 

Negro 

35.74 

14.6** 

84 

.60 

Latin 

31 .40 

15.07 

20 

3.38 

Clerical  I 

Total 

76.86 

33.89 

190 

(R-W) 

White 

79.06 

34.74 

86 

10  minutes 

Negro 

75-95 

29.34 

84 

.63 

Latin 

70.70 

29.88 

20 

.99 

Clerical  II 

Total 

47.47 

15.05 

190 

(R-W) 

White- 

47.99 

16.85 

86 

5  minutes 

Negro 

47.08 

13.74 

84 

.38 

Latin 

46 . 85 

32.44 

20 

.28 

Clerical  II 

Notal 

92.71 

26.24 

190 

(R-W) 

White 

94. 7U 

28.05 

86 

10  minutes 

Negro 

90.93 

25.65 

84 

.92 

*P<,05 
**p<. 01 

Latin 

91.40 

20.55 

20 

•  50 

(i)  t  ratios 

are  between  the  means 

of  the  white 

and  Negro  samples. 

(2)  i  ratios 

are  between  the  means 

of  the  white 

and  Latin 

samples. 
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Table  28:  Criteria-  Means,  Standard  Deviations,  U's 


and  Tests  of  Significance  of  Mean  Differences 
Merchandise  Handlers  I 


Cr  1  terion 

Croup 

X 

s_ 

N 

t 

accuracy 

\  O  v  cl  ~ 

4.01 

1.08 

190 

White 

4.02 

.96 

86 

Negro 

3.92 

1.16 

84 

>  1) 

*  12) 

Latin 

4.30 

1.22 

20 

i.ir  ' 

Accuracy 

Total 

3.88 

1.06 

190 

Under 

White 

3.77 

•  98 

86 

Pressure 

Negro 

3.95 

1.12 

84 

i.n 

Latin 

4.05 

1.15 

20 

l.ii 

Work 

Total 

3.92 

1.01 

190 

Speed 

White 

3.90 

1.04 

86 

Negro 

3.85 

1.01 

84 

.32 

Lat  in 

4.30 

1.15 

20 

1.59 

Learning 

Total 

3.98 

.97 

190 

Ability 

White 

3.97 

•  93 

86 

Negro 

4.00 

.96 

84 

.21 

Latin 

3.95 

1.23 

20 

.08 

Human 

Total 

4.10 

1.08 

190 

Relations 

White 

4.06 

1.02 

86 

Negro 

4.12 

1.09 

84 

.37 

Latin 

4.20 

1.28 

20 

.52 

Overall 

Total 

4.08 

•91 

190 

Effectiveness 

White 

4.05 

.85 

86 

Negro 

4.06 

•  95 

84 

.07 

Lat  in 

4.35 

•  99 

20 

1.37 

(1)  t  ratios  are 

between 

the  means 

of  the  white 

and  Negro  samples. 

(2)  t  ratios  are 

between 

the  means 

of  the  white 

and  Latin  samples. 
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Models  Illustrated 


It  should  be  emphasized  that  in  the  majority  of  predictor-criterion 
relationships  examined  for  the  white  and  Negro  samples,  no  bias  was  showy. 

Three  models  were  illustrated  in  the  comparisons  of  the  white  and  Negro 
sample  The  number  of  the  model  illustrated  is  shown  below  the  correlations 
for  the  Negro  sample  in  Table  29*  The  relationship  between  Verbal  Reasoning 
and  ratings  of  Accuracy  Under  Pressure  demonstrates  Model  2.  White  employees 
scored  higher  on  the  predictor  but  there  was  no  difference  between  the  two 
ethnic  groups  on  the  criteria.  Moreover,  the  validity  coefficients  were 
approximately  equal  for  the  two  groups.  Using  a  total  group  validation  pro¬ 
cedure  would  result  in  the  elimination  of  Negroes  whose  probability  of  job 
success  is  equal  to  that  of  the  white  employees  selected. 

The  most  frequently  illustrated  model  was  Model  5,  occurring  seven  times. 
Model  5  is  illustrative  of  the  situation  where  a  test  has  validity  for  one 
group,  none  for  the  other,  yet  mean  performance  on  both  the  predictor  and 
criterion  is  not  significantly  different  for  the  two  groups.  In  four  of, the 

seven  cases,  the  test  was  valid  only  for  the  white  sample.  The  use  of  such  tests 
as  selection  instruments  would  result  in  the  selection  of  better  performing 
employees  from  the  valid  group,  while  no  increase  in  prediction  efficiency 
is  obtained  by  using  the  test  for  selection  of  individuals  from  the  non-valid 
group. 

Model  7  was  illustrated  in  the  relationship  between  Verbal  Reasoning 
and  ratings  of  Learning  Ability.  Again,  white  employees  score  higher  on  the 
predictor  than  Negroes  but  their  job  performance  is  approximately  equal. 

However,  the  test  is  valid  only  for  the  white  subgroup.  Since  the  Negro 
sample  scores  lower  on  the  predictor  the  probability  of  a  Negro  being  selected 
is  lower  than  the  probability  of  a  white  being  selected.  Thus,  by  using  such  a 
test  as  a  selection  device  one  would  eliminate  Negroes  whose  probability  of 
of  job  success  is  equal  to  that  of  the  white  individuals  selected. 

Inspection  ■'  Table  30  reveals  that  forty  cases  of  model  5  were 
represented  in  the  comparisons  of  the  validity  patterns  for  the  Latin  and 
white  samples.  Because  of  the  small  sample  size  for  the  Latin  sample,  a 
rather  larger  correlation  (r^  .M»)  is  required  for  significance  at  the  .05 
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level.  Thus,  a  number  of  the  correlations  for  the  Latin  sample  may  not  be 
significant  even  though  the  absolute  magnitude  of  the  correlation  ■  is  larger 
than  the  significant  correlation  for  the  white  sample. 

Three  cases  of  Model  7  were  illustrated  in  the  relationship  between  Verbal 
Reasoning  and  the  rating  criteria.  Although  the  ratings  were  approximately 
equal  for  the  two  ethnic  groups ,  the  predictor  was  valid  only  for  the  white 
sample.  Since  the  Latin  sample  obtained  lower  predictor  scores,  they  would 
have  a  lower  probability  of  being  selected,  even  though  the  criterion  per¬ 
formance  of  the  two  ethnic  groups  was  similar. 

Applying  the  additional  criterion  of  a  significant  difference  between 
validity  coefficients  eliminates  all  illustrations  of  Models  5  and  7  in  both 
the  white  and  Negro  comparisons  as  well  as  the  white  and  Latin  comparisons. 

Table  31  presents  the  results  of  the  regression  tests  of  the  analysis 
of  covariance  (Potthoff,  1966).  This  analysis  simultaneously  tests  the 
hypothesis  that  the  regression  slopes  and  intercepts  are  equal  for  the  three 
ethnic  groups.  All  the  F  ratios  were  not  significant  indicating  that  no 
bias  was  present. 


Background  Data  -  Merchandise  Handlers  II 

Table  32  presents  the  biographical  data  for  the  above  job 
classification  sample.  This  sample  included  a  small  number  of 
employees  of  Latin- American  extraction. 

Negro  employees  in  this  job  classification  were  younger  than  the 
white  employees  and  had  relatively  shorter  company  service.  The 
mean  educational  level  of  the  Negro  sample  was  approximately  one  year 
above  tie  white  sample.  Biographical  characteristics  of  the  Latin 
sample  tended  to  be  similar  to  the  white  sample.  Mean  scores  for 
the  two  groups  did  not  differ  significantly. 

Table  32:  Biographical  Data- Merchandise  Handlers  II 


Group 

X 

£ 

N 

t 

Age 

Total 

29.58 

10.09 

2 64 

White 

32.10 

12.15 

122 

Negro 

27.09 

6.95 

125 

3-95** 

Latin 

29.76 

9.27 

17 

CM 

Tenure 

Total 

2.50 

1.12 

264 

(Years) 

White 

2.84 

1.16 

122 

Negro 

2.1 6 

•97 

125 

4.97** 

Latin 

2.59 

1.18 

17 

•  83 

Education  Total 

11.  oh 

1.92 

259 

lYears) 

White 

10.65 

2.18 

118 

Negro 

11.48 

1.53 

124 

3.40** 

Latin 

10.53 

2.00 

17 

.21 

t  ratios  are  between  the  means  of  the  white  and  Negro  samples, 
t  ratios  are  between  the  means  of  the  white  and  Iatin  samples. 


Y*» 


**p<.01 


Rredictcr  Comparisons 

Mean  predictor  scores  for  the  total  group,  whites,  Negroes, 
av>d  Latins  are  presented  in  Table  33*  White  employees  scored  sig¬ 
nificantly  higher  than  Negro  employees  on  all  tests  except  Clerical 
Test  II.  It  is  important  to  note  that  increasing  the  time  limits  of 
the  teste  did  not  reduce  these  racial  differences. 

Iredictor  scores  for  the  Latin  sample  tended  to  approximate 
those  oi  the  white  sample.  Scores  for  these  two  ethnic  groups  differed 
only  in  one  comparison;  white  employees  obtained  higher  scores  than 
Latins  cn  the  Verbal  Reasoning  Test. 

Criterion  Comparisons 

Kean  criterion  data  for  the  three  ethnic  groups  is  presented 
in  Table  3^.  Ratings  for  white  employees  were  significantly  higher 
than  those  for  Negro  employees  only  on  the  criterion  of  Learning  Ability. 
Correlations  between  tenure  and  the  rating  criteria  were  not  significant, 
indicating  that  experience  was  not  a  major  factor  contributing  to  the 
obtained  mean  criterion  differences  for  the  white  and  Negro  samples. 

Comparisons  of  the  mean  criterion  performance  of  the  Latin  and 
white  samples  yielded  no  significant  differences. 

Validity 

Correlations  between  the  predictors  and  criteria  are  pre¬ 
sented  in  Table  35*  Again  the  clerical  tests  produced  higher  corre¬ 
lations  with  the  various  criteria  than  either  the  Verbal  or  Arithmetic 
Reasoning  Test.  Similar  validity  patterns  were  exhibited  by  both  of 
the  clerical  tests  with  Accuracy,  Learning  Ability,  and  Work  Speed 
being  the  most  predictable  criteria. 

Comparing  the  Negro  and  white  sample,  we  find  that  In  18  out 
of  a  possible  60  instances,  a  test  correlated  significantly  with  the 
criterion  for  one  racial  group  but  not  the  other.  It  should  be 
noted  that  it  was  not  always  the  white  group  which  was  more  predict¬ 
able.  Cn  fact,  in  over  hall’  oi’  these  cases  the  test  was  valid  for 
the  Negro  sample,  but  not  valid  for  the  white  sample. 

With  few  exceptions,  increasing  the  time  limit  on  the  clerical 
tests  f>’om  five  to  ten  minutes  resulted  in  an  Increase  in  the  validity 
coefficients  for  all  ethnic  groups. 


Tabic  33:  Pi*ed  tutors -Means,  Rtttidnrd  Deviations, 
N'g  and  Teats  of  Significance  of  Mean  Differences 
Merchandise  Handlers  II 


Group 

X 

£. 

N 

i 

Verbal 

Total 

19.61 

10.44 

264 

Reasoning 

White 

22.44 

11.02 

122 

3.54»*| 

3.54** 

Negro 

Latin 

17.80 

12.65 

9.42 

6.96 

125 

17 

Arithmetic 

Total 

21.14 

8.8l 

264 

Reasoning 

White 

23.75 

9.53 

122 

Negro 

18.42 

7.49 

125 

4.87*» 

Latin 

22.41 

6.69 

17 

•  71 

Clerical  I 

Total 

40.38 

12.21 

264 

5  minutes 

White 

42 . 84 

12.08 

122 

Negro 

37.56 

11.77 

125 

3-46** 

Latin 

43.53 

12.47 

1? 

.22 

Clerical  t 

Tota  1 

85-08 

24.92 

264 

10  minutes 

White 

91.02 

24.10 

122 

Negro 

78.34 

24.20 

125 

4.11** 

Latin 

92.06 

24.50 

17 

•  17 

Clerical  II 

Total 

53.48 

12.06 

264 

5  minutes 

White 

54.61 

12.83 

122 

Negro 

52.65 

11.14 

125 

1.27 

Latin 

51-53 

12.81 

17 

.92 

Clerical  I! 

Tota  1 

103J15 

22.60 

264 

10  minutes 

Wh  1  to 

!  0‘5 . 8» 

2  1.79 

122 

Neg-o 

101.38 

23.32 

1.". 

1.43 

Lai  In 

IOI.94 

22.67 

•7 

.68 

Clerical  I 

Tota  1 

33.31 

14.37 

264 

(K-W) 

White 

36.32 

14.27 

122 

5  minutes 

Negro 

30 . 02 

13.73 

125 

3.52** 

Latin 

35.94 

15.2  1 

17 

.  10 

Clerical  I 

Tola  1 

73-26 

28.96 

;*(i4 

(K-W) 

Whi  to 

0o.6y 

27.50 

122 

10  minutes 

Negro 

65.10 

28.09 

124 

4 .  59** 

Latin 

79.94 

30.96 

•7 

.  10 

Clerical  11 

Tota  1 

46.73 

12 . 95 

2(,4 

(R-W) 

White 

48.68 

13.69 

122 

5  minutes 

Negro 

45.H 

11.60 

124 

2 .20** 

Latin 

44.59 

15.60 

17 

1.13 

Clerical  II 

Tota  1 

90.52 

23.46 

2(>'l 

( R-W) 

White 

93.71 

24.01 

122 

10  minutes 

Negro 

87.4} 

22.26 

124 

2.  12** 

Latin 

90.29 

26.28 

17 

.44 

.05 

**pv  *01 

(i)  t  ratios 

are  between  the  means 

of  the  wiii  to 

and  Negro 

sample:, . 

(2)  t  ratios 

are  between  the  means 

of  the  white 

and  Latin 

samples. 

Y6 


Table  34<  Criteria*  Means,  Standard  Deviations, 
N's  and  Tests  of  Significance  of  Mean  Differences 
Merchandise  Handlers  II 


Croup 

X 

8. 

N 

t 

Accuracy 

Total 

3.81 

1.01 

264 

White 

3.86 

1.06 

122 

.32 

Negro 

3-77 

•94 

125 

Latin 

3.77 

1.20 

1? 

Accuracy 

Total 

3.74 

1.01 

264 

Under 

White 

3.75 

1.06 

122 

Pressure 

Negro 

3.70 

.97 

125 

•  39 

Latin 

3.94 

1.03 

17 

.69 

Work 

Total 

3-79 

1.03 

264 

Speed 

White 

3.91 

1.14 

122 

Negro 

3.66 

.88 

125 

1.92 

Latin 

3.88 

1.22 

17 

.10 

Learning 

Total 

3.66 

.88 

264 

Ability 

White 

3.79 

•  90 

122 

Negro 

3.53 

.84 

125 

2.34* 

Latin 

3.65 

1.00 

17 

.59 

Human 

Total 

3.85 

1.05 

264 

Relations 

White 

3.86 

1.15 

122 

Negro 

3.86 

■97 

125 

.00 

Latin 

3.76 

.83 

17 

.34 

Overall 

Total 

3.80 

1.08 

26  4 

Effectiveness 

White 

3.89 

1.21 

122 

Negro 

3.70 

•  94 

125 

1.44 

Lat  in 

4.00 

1.06 

17 

.32 

(1)  t  ratios  between  the  means  of  the  white  and  Negro  samples. 

(2)  t  ratios  between  the  means  of  the  white  and  Latin  samples. 
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Although  the  Latin  sample  closely  resembles  the  white  sample 
with  regard  to  both  predictor  and  criteria  performance,  the  test 
validity  pattern  of  the  white  sample  was  not  mirrored  closely  by 
the  Latin  sample.  It  should  be  noted  that  a  relatively  large  correlation 
(r>.46)  was  required  for  significance  at  the  .05  level  for  the  small 
Latin  sample. 

Models  Illustrated 

Six  cases  of  Model  1  were  illustrated  in  the  comparisons  between 
the  white  and  Negro  samples.  The  number  of  the  specific  model  illus¬ 
trated  is  shown  below  the  correlations  for  the  Negro  sample  in  Table 
35 •  The  white  sample  scored  significantly  higher  than  the  Negro 
sample  on  the  clerical  tests  and  also  on  the  criterion  of  Learning 
Ability.  Moreover,  the  validity  is  approximately  equal  for  the  two 
racial  groups.  In  this  situation  discrimination  on  the  test  reflects 
a  real  difference  in  predicted  performance.  Thus,  selection  with  the 
test  does  not  constitute  unfair  discrimination. 

Model  2,  occurring  24  times,  represents  the  situation  where 
there  is  a  significant  difference  between  the  mean  predictor  scores 
for  the  two  racial  groups,  yet  no  significant  difference  in  the  cri¬ 
terion.  The  correlation  between  the  predictor  and  criterion  is  approximately 
equal  for  the  two  groups.  If  a  cutting  score  were  set  on  the  basis 
of  the  total  sample,  the  Negro  group  would  not  have  an  equal  probability 
of  being  selected,  even  though  their  chances  of  job  success  were 
essentially  equal. 

Two  illustrations  of  Model  3  were  represented  in  the  relation¬ 
ship  between  Clerical  Test  II  and  ratings  of  Learning  Ability.  Validities 
for  the  two  racial  groups  were  essentially  equal.  Although  there 
was  no  difference  in  the  mean  predictor  performance  for  the  two  racial 
groups,  the  white  sample  obtained  higher  ratings  of  job  performance. 

Total  group  validation  would  result  in  an  underprediction  for  white 
employees  and  an  overpred'.ction  for  Negro  employees. 

Examining  the  relationship  between  scores  on  Clerical  Test  II 
(■5  minutes)  and  four  criteria,  we  find  four  cases  of  Model  5.  Predictor 
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and  criterion  performance  was  approximately  equal  for  the  two  ethnic 
groups.  However,  the  test  was  valid  for  one  group  hut  not  the  other. 

In  over  half  of  these  cases  the  predictor  was  valid  for  the  white  sample, 
hut  invalid  for  the  Negro  Sample,  The  result  of  using  such  a  test  would 
he  the  selection  of  better  performing  persons  from  the  valid  group  than 
from  the  invalid  group. 

Illustrated  twelve  times,  Model  ?  is  representative  of  the  situa¬ 
tion  where  there  is  a  difference  in  predictor  performance  hut  no 
difference  in  criterion  performance  for  the  two  groups.  Also,  the 
predictor-criterion  correlations  are  valid  for  only  one  subgroup. 

It  is  interesting  to  note  that  the  predictor  is  valid  for  the  Negro 
sample  in  eleven  out  of  the  twelve  cases. 

The  final  model  illustrated  in  the  Negro-white  comparison 
was  Model  8.  Performance  of  white  employees  is  higher  than  Negroes 
not  only  on  the  tests  of  Verbal  and  Arithmetic  Reasoning,  hut  also 
on  the  ratings  of  Learning  Ability.  The  Arithmetic  Reasoning  Test 
was  valid  for  the  white  sample  while  the  Verbal  Reasoning  Test 
was  valid  for  the  Negro  sample. 

Forty-two  illustrations  of  Model  5  were  found  in  the  compari¬ 
sons  of  the  white  arid  Latin  samples.  The  two  ethnic  groups  are 
approximately  equal  on  the  criterion  measures  and  differ  only  on 
one  predictor*— -Verbal  Reasoning.  Since  none  of  the  predictors  are 
valid  for  the  Latin  sample,  any  significant  correlation  in  the 
white  sample  (except  Verbal  Reasoning)  produces  a  Model  5* 

Only  one  additional  model  appeared  in  the  whita-Latin  compari¬ 
sons.  Model  7  was  illustrated  in  the  relationship  between  Verbal 
Reasoning  and  Ratings  of  Accuracy. 

The  criterion  of  whether  the  correlation  between  a  test  and 
criterion  was  significantly  greater  than  zero  in  neither,  both,  or 
one  of  the  subgroups  was  used  to  identify  the  above  mentioned  models. 
Applying  the  additional  criterion  of  a  significant  difference  between 
validity  coefficients  for  the  two  racial  groups  (this  criterion  applies 
only  to  Models  5  through  10)  only  four  models  emerge.  Two  Model  7 
cases  meet  this  additional  criteria,  namely,  the  correlations  between 
Verbal  Reasoning  and  Clerical  I  (10  minutes)  and  ratings  of  Human 
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Relations  in  the  Negro-white  comparisons.  Two  models  in  the  Latin- 
white  compaixsons  satisfy  this  additional  criterion— the  relationships 
between  Clerical  I  (both  ten  minute  forms)  and  ratings  of  Learning 
Ability.  The  superscript  a  in  Tables  35  and  36  indicates  those  models 
which  meet  this  additional  criterion. 

Table  37  presents  the  results  of  the  regression  teats  for  the  analysis 
of  covariance  (Potthoff,  1 966).  This  analysis  simultaneously  tests 
the  hypothesis  that  the  regression  slopes  and  intercepts  are  equal 
for  the  three  ethnic  groups.  Inspection  of  Table  37  reveals  that 
usi.ov,  .his  method  of  analysis  only  two  relationships  demonstrated  bias 
as  indicated  by  the  significant  F^  ratio.  Both  forms  of  the  ten 
minute  clerical  test  were  biased  in  predicting  ratings  of  Learning 
Ability.  The  significant  Fg  ratio  indicated  that  a  common  regression 
slope  could  not  be  used  with  the  three  ethnic  groups. 


Background  Data  -  Clerical  I 

Table  38  presents  the  biographical  data  for  the  Clerical  I  job 
classification  sample.  Eight  employees  of  Mexican- American  extraction 
vere  included  in  the  original  sample.  A  separate  subgroup  analysis  was 
not  performed  on  this  ethnic  group  since  it  was  too  small  to  make  re¬ 
liable  comparisons.  White  employees  in  this  sample  were  older  and  had 
longer  company  service  than  their  Negro  counterparts.  The  educational 
level  of  the  Negro  sample,  however,  was  approximately  two  years  above 
that  of  the  white  sample. 


Table  38:''  Biographical  Data-Clerlcal  I 


Group 

_X 

s 

1 

Age 

Total 

35.19 

13.57 

129 

White 

37-  77 

13.84 

99 

Negro 

28.00 

7.63 

22 

4.50** 

Tenure 

Total 

2.90 

1.06 

3.29 

(Years) 

White 

3-04 

1.09 

99 

Negro 

2.41 

.80 

22 

2.52* 

Education 

Total 

10.67 

1.87 

129 

(Years) 

White 

10.31 

1.69 

99 

Negro 

12.55 

1.60 

22 

5.63** 

*p<.05 

**p<.01 

^t  ratios  are 

:  between  the 

means  of 

the  white 

and  Negro 

samples 
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Predictor  Comparisons 


Mean  predictor  scores  for  the  two  racial  groups  are  presented  in  Table 
39*  There  were  no  significant  differences  between  the  mean  performance  of  the 
two  racial  groups  on  any  of  the  predictors. 

Criterion  Comparisons 

Table  40  presents  the  mean  criterion  scores  for  the  white  and  Negro 
samples.  Like  the  predictor  scores,  there  were  no  differences  between  the  two 
samples  on  any  of  the  mean  criterion  scopes. 

Validity 

The  correlations  between  the  predictors  and  criteria  were  rather  dis¬ 
appointing  as  indicated  by  inspection  of  Table  Ul.  In  fact,  out  of  180 
possible  relationships,  only  4l  were  significant  at  the  .05  level.  Furthermore, 
of  the  60  white-Negro  comparisons,  in  only  one  case  was  the  correlation  signifi¬ 
cant  for  both  racial  groups. 

Despite  a  considerable  differential  in  sample  size,  both  racial  groups 
appear  equally  predictable.  The  rating  of  Work  Speed  was  the  most  predictable 
criterion  for  both  racial  groups. 

Models  Illustrated 

Nineteen  cases  of  Model  5  were  illustrated  in  this  sample.  The  number 
ir.  parentheses  below  the  correlation  for  the  Negro  sample  in  Table  4l  indicates 
the  model  represented.  Model  5  is  illustrative  of  the  situation  where  no 
significant  mean  differences  exist  between  the  two  racial  groups  on  either  the 
predictor  or  criterion,  but  the  test  is  valid  for  only  one  racial  group.  In 
eight  out  of  the  nineteen  cases,  the  Negro  group  was  the  most  predictable 
racial  group. 

The  relationship  between  Verbal  Reasoning  and  ratings  of  Work  Dpeed 
and  the  relationship  between  Clerical  I  (10  minutes)  and  ratings  of  Overall 
Effectiveness  were  the  only  illustrations  of  Model  5  which  remained  when  the 
additional  criterion  of  a  significant  difference  between  validity  coefficients 
was  utilized.  The  superscript'V  in  Table  4l  indicates  those  models  which 
meet  tnis  additional  criterion.  In  both  of  these  cases  the  vulidity  coefficient 


Table 

39=  Predlctors-Moans , 

Standard 

Deviations,  N*s, 

arc! 

Tests  of  Significance 

of  Mean 

Differences 

Clerical 

I 

Predictor 

Group 

X 

£ 

N 

i0) 

Verbal 

Total 

22.03 

9-73 

129 

Reasoning 

White 

21.8b 

9.69 

99 

Negro 

25.27 

9.49 

22 

1.49 

Arithmetic 

Total 

23.57 

8. 18 

129 

Reasoning 

White 

23.44 

8.62 

99 

Negro 

25-23 

6.50 

22 

.91 

Clerical  I 

Total 

48.26 

11.46 

129 

5  minutes 

Wh  1  to 

48.05 

10.66 

99 

Negro 

49-27 

15. 12 

22 

.44 

Clerical  1 

Total 

100.46 

20.37 

129 

10  minutes 

White 

99-59 

19.40 

99 

Negro 

105.91 

24,38 

22 

1.30 

Clerical  11 

Total 

57.80 

11 .29 

129 

5  minutes 

White 

57.91 

10.57 

99 

Negro 

57.91 

15.13 

22 

,00 

Clerical  11 

Total 

J  11.88 

19.63 

129 

10  minutes 

Wh  lie 

1 1 J . 90 

18.27 

99 

Negro 

113.27 

26.64 

22 

.29 

Clerical  I 

Tota  1 

41.63 

13.17 

129 

(R-W) 

5  minutes 

White 

41.29 

12.09 

99 

Negro 

42.68 

I7.7J 

22 

.44 

Clerical  I 

Total 

O9.91 

22.93 

129 

(R-W) 

10  minutes 

Wh  1  to 

89.22 

21.57 

99 

Negro 

97.00 

25.96 

22 

I.65 

Clerical  II 

Total 

51.16 

12.56 

129 

(R-W) 

5  minutes 

Whi  te 

51.39 

12.10 

99 

Negro 

50.73 

14.91 

22 

.22 

Clerical  II 

Total 

99-12 

22.45 

129 

( R-W) 

10  minutes 

White 

99.46 

21.88 

99 

Negro 

98.55 

26,62 

22 

.17 

(1)  t  ratios  are  between  the  means  ,V  the  white  and  Negro  samples. 
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Table  40:  Crtterla-Means,  Standnrd  Deviations,  N*a 


and  Tests  of  Significance  of  Mean  Differences 
Clerical  ] 


Crlterl  on 

Group 

X 

s 

N 

il 

Accuracy 

Total 

3-99 

•  98 

129 

WhI  te 

3.89 

•  99 

99 

Negro 

4.00 

•  8? 

22 

.48 

Accuracy 

Total 

3.90 

.95 

129 

Under 

White 

3.86 

•  91 

99 

Pressure 

Negro 

3.86 

1 . 04 

22 

.00 

Work 

Total 

3.9  ‘j 

1.03 

129 

Speed 

White 

3.8<; 

l .  06 

99 

Negro 

4.  ifl 

•  96 

22 

l.J? 

Learning 

Tota  1 

3-97 

.98 

129 

Ability 

White 

3.86 

.94 

99 

Negro 

4.14 

1.04 

22 

1 .23 

Human 

Total 

3.94 

.92 

129 

Relations 

White 

3.96 

•  96 

199 

Negro 

3-91 

.81 

22 

.2  3 

Overall 

Total 

4.o6 

1  .00 

129 

Effectiveness 

Wh  1  te 

4.02 

1.03 

99 

Negro 

4.09 

.84 

22 

.13 

(1)  t  ratios  are 

between  the  moans 

of  the  white 

and  Negro 

samples. 
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was  not  significant  for  the  total  group.  Thus,  traditional  validation  procedures, 
using  only  total  group  analysis,  would  result  in  the  elimination  of  potentially 
valid  predictors. 

Table  k2  presents  the  results  of  the  regression  tests  of  the  analysis  of 
covariance  (Potthoff,  1966).  All  of  the  F  ratios  were  not  significant  indicating 
that  no  bias  was  present.  It  should  be  noted  that  using  this  method  the  two 
relationships  mentioned  above  which  met  the  additional  criterion  of  a  significant 
difference  between  validity  coefficients  fail  to  demonstrate  bias. 
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Background  Data  -  Machine  Clerical  I  and  II 


Biographical  data  for  the  combined  job  classifications  Machine  Clerical 
I  and  II  is  presented  below. 

The  original  sample  contained  six  employees  of  Mexican-Ameriean  extrac¬ 
tion.  These  subjects  were  not  included  in  the  subgroup  analysis  since  the 
sample  was  too  small  to  make  reliable  comparisons. 

The  trend  which  has  occurred  throughout  study  6  was  again  demonstrated 
in  this  combined  job  classification.  White  employees  were  older  and  had  more 
company  service  than  their  Negro  counterparts.  The  mean  educational  level  of 
the  Negro  employees,  as  reported  on  the  application  blank,  was  significantly 
higher  than  the  educational  level  of  the  white  employees. 

Table  43:  Biographical  Data-Machine  Clerical  I  and  II 


Group 

X 

s_ 

1 

tvx/ 

Age 

Total 

29.7k 

11.20 

91 

White 

32.37 

12.65 

60 

Negro 

25-1*5 

5.97 

31 

3.50** 

Tenure 

Total 

2.75 

1.25 

91 

(Years ) 

White 

3.25 

1.24 

60 

Negro 

1.81+ 

.69 

31 

6.89** 

Education 

Total 

11.65 

1.17 

91 

(Years) 

White 

11.37 

1.22 

60 

Negro 

12.10 

•  87 

31 

3.:'5** 

**  p  <  .01 

^  t  ratio  between  the  white  and  Negro  sample. 


Predictor  Comparisons 


Mean  predictor  scores  for  the  two  ethnic  groups  are  presented  in  Table 
44.  White  employees  scored  significantly  higher  than  Negro  employees  on 
Clerical  Test  I.  This  difference  occurred  both  on  the  5  and  10  minute  time 
limit  as  well  as  on  both  the  corrected  (guessing  factor)  and  non-corrected 
scores. 

No  significant  differences  existed  between  the  two  racial  groups  on  any 
of  the  other  predictor  measures. 

Criterion  Comparisons 

As  shown  in  Table  45,  the  mean  performance  ratings  of  the  two  ethnic 
groups  were  approximately  equal  on  four  out  of  tne  six  criteria.  White 
employees,  however,  had  higher  mean  performance  ratings  on  both  Human  Rela¬ 
tions  and  Overall  Effectiveness. 

Because  of  the  differential  length  of  service  for  the  two  ethnic  groups, 
correlations  were  computed  between  tenure  and  the  various  criteria.  No  sig¬ 
nificant  correlations  emerged  from  this  analysis  indicating  that  Job  experience 
was  not  a  major  factor  contributing  to  the  obtained  criterion  differences  for 
the  two  ethnic  groups. 

Validity 

Validity  coefficients  for  the  two  racial  groups  are  presented  in 
Table  46.  Inspection  of  the  table  reveals  a  distinct  differential  validity 
pattern  for  the  two  racial  groups.  In  fact,  the  predictor  correlated  positively 
with  the  criterion  for  the  white  sample  but  correlated  negatively  for  the  Negro 
sample  in  a  large  number  of  the  predictor-criterion  relationships. 

Examining  specific  predictors,  we  find  that  the  Verbal  Reasoning  test 
did  not  predict  any  of  the  criteria  for  either  racial  group.  Likewise,  the 
Arithmetic  Reasoning  Test  possessed  little  validity  for  either  racial  group. 

The  clerical  tests,  on  the  other  hand,  predicted  most  criteria  for  both  ethnic 
groups . 


Table  44:.  Predictors-  Means,  Standard  Deviations,  N's 
and  Tests  of  Significance  of  Mean  Differences 

Machine  Clerical  l  and  II 


Predictor 

Group 

X 

1 

t^ 

Verbal 

Total 

24.45 

9.75 

97 

Reasoning 

White 

25-23 

10.41 

60 

Negro 

24.13 

8.90 

31 

•  5c 

Arithmetic 

Total 

26.97 

7 .86 

97 

Reasoning 

Wh  Ite 

27-82 

8.82 

60 

Negro 

25-26 

6.12 

31 

1.60 

Clerical  I 

Total 

51.86 

13.06 

97 

5  minutes 

White 

54.18 

14.08 

60 

Negro 

46,52 

10.17 

31 

2 . 66* 

Clerical  I 

Total 

109.71 

24.51 

97 

10  minutes 

White 

114.32 

26.14 

60 

Negro 

99-13 

19.08 

31 

2.83* 

Clerical  II 

Total 

60.34 

14.37 

97 

5  minutes 

White 

61.67 

13.67 

60 

Negro 

99.87 

l4.6l 

3! 

-57 

Clerical  J I 

Total 

U9.02 

25.43 

97 

10  minutes 

Wh!  t,e 

122.75 

25.11 

60 

Negro 

1 16.55 

22.29 

5 ) 

1.15 

Clerical  I 

Total 

46.07 

14. 19 

97 

(R-W) 

Whi  to 

48. 15 

14.69 

60 

5  minutes 

Negro 

41 .26 

12 . 97 

31 

2. 18* 

Clerical  I 

Total 

100.38 

26.59 

97 

(R-W) 

White 

104,67 

27.96 

60 

10  minutes 

Negro 

89.94 

22.38 

31 

2.51* 

Clerical  II 

Total 

54.96 

14.63 

9? 

(R-W) 

White 

56.37 

14. 16 

60 

5  minutes 

Negro 

53.84 

14.03 

31 

•79 

Clerical  II 

Total 

*r\ 

00 

0 

25-84 

97 

(R-W) 

White 

1D.93 

26.11 

60 

10  minutes 

Negro 

104.81 

22.46 

31 

1.28 

*p<,05 

<  ) 

fs'lis  are  be ' 

tweer.  the  means 

f  r  h*?  wh  I  *  £ 

Hr  j  Msg 

samp  lea. 
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Table  4® 

:  Criteria  - 

Means,  Standard 

Deviations,  N's, 

and 

Tests  of  Signif loanee  of 

Mean  Differences 

Machine 

Clerical  I 

and 

II 

Group 

X 

s, 

N 

tU) 

Accuracy 

Total 

4.4  6 

1.39 

97 

White 

4.65 

1.49 

60 

Negro 

4.19 

1.25 

31 

1.46 

Accuracy 

Total 

4.21 

1.28 

97 

Under 

White 

4.40 

1.30 

60 

Pressure 

Negro 

3.90 

1.27 

31 

1-73 

Work 

Total 

3-99 

1.24 

97 

Speed 

White 

4.08 

1.32 

60 

Negro 

3.81 

1.08 

31 

•  97 

Learning 

Total 

4.28 

1.22 

97 

Ability 

White 

4.43 

1.27 

60 

Negro 

4.10 

1.14 

31 

1.20 

Human 

Total 

4.59 

1.28 

97 

Relations 

White 

4.83 

1.30 

60 

Negro 

4.23 

1.23 

31 

2.10* 

Overall 

Total 

4.59 

1.41 

97 

Effectiveness 

White 

4.87 

1.41 

60 

Negro 

4.13 

1.38 

31 

2.36* 

*P<.05 


i  ratios  are  between  the  means  of  the  white  and  Negro  samples. 
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Models  Illustrated 


Five  different  models  were  represented  in  this  sample.  The  specific 
model  illustrated  is  again  represented  by  the  number  in  parentheses  below  the 
correlation  for  the  Negro  sample  in  Table  **6. 

Illustrated  in  eleven  cases.  Model  5  represents  the  situation  where  a 
test  has  validity  for  one  group,  none  for  the  other,  yet  mean  performance  on 
both  the  criterion  and  predictor  is  not  significantly  different.  In  a  number 
of  these  situations  the  test  correlates  positively  with  the  criteria  for  one 
racial  group  and  negatively  for  the  other  which  tends  to  eliminate  the  validity 
of  the  test  based  on  the  total  sample.  Inspecting  the  specific  illustrations 
of  Model  5»  we  find  that  both  raciul  groups  appear  equally  predictable.  That 
is,  in  approximately  one-half  the  cases  the  test  is  a  valid  predictor  for  the 
Negro  sample. 

Model  6,  as  illustrated  in  the  correlation  between  the  Clerical  II  test 
and  ratings  of  Human  Relations  and  Overall  Effectiveness,  occurred  nine  times. 

It  is  interesting  to  note  that  in  all  cases  of  Model  6,  the  test  correlated 
significantly  with  the  criteria  for  Negro  employees  but  not  for  white  employees. 

Model  7  was  the  most  frequently  occurring  model  in  this  sample.  Twelve 
cases  were  represented  in  the  relationships  between  all  versions  of  the  Cleri¬ 
cal  I  test  and  the  various  criteria.  In  all  illustrations  of  this  model,  the 
test  possessed  validity  only  for  the  white  sample.  Using  this  test  as  a 
selection  instrument  would  result  in  the  elimination  of  Negro  subjects  whose 
probability  of  job  success  is  equal  to  those  of  the  white  subjects  selected 
since  the  Negroes  score  lower  on  the  predictor. 

Four  cases  of  Model  8  were  illustrated.  White  employees  scored  higher 
on  the  Clerical  I  test  and  also  on  the  criterion  of  Human  Relations.  However, 
the  test  is  a  valid  predictor  only  for  the  Negro  sample.  It  is  somewhat 
ironical  that  even  though  the  test  is  valid  for  the  Negro  sample,  the  proba¬ 
bility  of  a  Negro  being  selected  is  lower  than  the  probability  of  a  white 
individual  since  the  Negro  group  scores  lower  on  the  predictor.  This  situation 
reinforces  the  need,  not  only  for  subgroup  validation,  but  also  for  a  comparison 
of  validity  coefficients  as  well  as  mean  differences  for  the  two  racial  groups. 


The  relationship  between  ratings  of  Learning  Ability  and  performance 
on  Clerical  I  test  illustrates  Model  10.  Although  there  was  no  difference 
on  the  criterion  between  the  two  racial  groups ,  white  employees  obtained 
higher  scores  on  the  Clerical  I  test.  Because  the  test  correlated  in  opposite 
directions  for  the  two  racial  groups,  combining  them  results  in  no  validity. 
Either  differential  or  non-linear  prediction  is  required  to  yield  valid  pre¬ 
dictions. 

The  criteria  used  for  identifying  the  above  models  was  whether  the 
correlation  between  the  test  and  criterion  was  significantly  greater  than 
zero  in  neither,  both,  or  one  of  the  subgroups.  If  the  additional  criterion 
of  a  significant  difference  between  validity  coefficients  for  the  two  racial 
groups  is  imposed  (this  applies  only  to  Models  5  through  10)  the  frequency  of 
the  various  models  illustrated  changes  only  slightly.  Only  seven  illustrations 
of  model  5  are  represented  using  this  somewhat  more  restrictive  criterion,  while 
the  frequency  of  the  other  models  remains  unchanged.  The  superscript  a  in 
Table  **6  indicates  those  models  which  meet  this  additional  criterion. 

Table  bj  presents  the  results  of  the  regression  tests  for  the  analysis 
of  covariance  (Fotthoff,  1966).  The  ratio  of  this  analysis  conformed 
with  the  more  restrictive  definition  of  bias  (i.e.,  no  biar  was  demonstrated 
unless  the  validity  coefficients  for  the  two  racial  groups  differed  signifi¬ 
cantly).  It  should  be  noted  that  this  analysis  yielded  significant  F ^  ratios 
in  cases  where  the  validity  coefficients  for  the  two  racial  groups  were  not 
significant  but  there  was  a  significant  difference  between  the  two  -"’lidity 
coefficients . 
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Tabled:  Analysis  of  Covariance  for  Homogeneity  of  Regression 
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Biographical  Data  -  Miscellaneous  Clerical 

Table  48  presents  biographical  data  on  the  remaining  clerical 
positions.  White  employees  in  this  sample  are  older  and  have  longer 
company  service  than  Negro  employees.  The  Negro  employees*  educational 
level  is  approximately  a  year  higher  than  the  educational  level  of  the 
white  employees. 

Table  48: Biographical  Data 
Miscellaneous  Clerical 


Group 

X 

N 

i(1) 

Age 

Total 

31.87 

12.44 

130 

White 

33.13 

13.02 

106 

Negro 

26.29 

7.32 

24 

3.44*** 

Tenure 

Total 

2.91 

1.06 

130 

(Years) 

White 

3.04 

1.05 

106 

Negro 

2.29 

.91 

24 

3.26** 

Education 

Total 

11.44 

1.38 

129 

(Years) 

White 

11.27 

1.44 

105 

Negro 

12.21 

•72 

24 

4.'/f#* 

*«p<01 

^t  ratios  are  between  the  means  of  the  white  and  Negro  sa?np'les 
Predictor  Comparisons 

Table  49  presents  the  mean  predictor  data  for  the  total,  white 
and  Negro  samples.  Mean  predictor  performance  for  the  two  racial  groups 
is  approximately  equal  across  all  predictors. 

Criterion  Comparisons 

Mean  criterion  scores  for  the  two  racial  groups  are  presented  in 
Table  50.  Dike  the  predictor  comparisons  no  significant  mean  differences 
were  found  between  the  two  racial  groups  on  any  of  the  criterion  measures. 


Table  ;  I’redictors-Means ,  Standard  Deviations, 
N's  and  Tests  of  Significance  of  Mean  Differences 
Miscellaneous  Clerical 


Group 

X 

a 

N 

Verbal 

Total 

25.08 

9.04 

130 

Reasoning 

White 

25.45 

9.26 

106 

Negro 

23.42 

7-98 

24 

•  99 

Ari  thmetic 

Total 

25.98 

7-90 

130 

Reasoning 

White 

26.04 

8.26 

106 

Negro 

25.71 

6.21 

24 

.  18 

Clerical  I 

Tota  L 

53.68 

13.35 

130 

5  minutes 

White 

54.46 

14.  1 1 

106 

Negro 

50.25 

8.74 

24 

1.83 

Clerical  1 

Total 

Hi .  14 

22.7! 

130 

10  minutes 

Will  te 

U2.48 

23.89 

106 

Negro 

105.21 

15.49 

24 

1.83 

Clerical  T I 

Total 

62. 10 

13.28 

130 

5  minutes 

White 

62.57 

13.74 

10(. 

Negro 

60.04 

1  1  . 00 

24 

.84 

Clerical  II 

Total 

121.42 

22.81 

130 

10  minutes 

Wh  1  te 

121.58 

22 . 82 

106 

Negro 

120.67 

25.21 

24 

.17 

Clerical  I 

Total 

48.  10 

13.86 

130 

(R-W) 

5  minutes 

White 

48.74 

14.74 

106 

Negro 

45.29 

8.68 

2'I 

1.49 

Clerical  I 

Tota  1 

102.18 

24.04 

130 

(R-W) 

10  minutes 

White 

103.27 

25.51 

106 

Negro 

97.  33 

15.56 

24 

1.45 

Clerical  II 

Tota  1 

55.94 

13-97 

130 

(R-W) 

5  minutes 

Wh  1  te 

56..;  6 

J  4 . 56 

I06 

Negro 

54.54 

11.  15 

24 

.54 

Clerical  II 

Total 

109.48 

24.75 

130 

(R-W) 

10  minutes 

Whit-' 

109.45 

25.16 

J06 

Negro 

109.58 

23.23 

24 

.02 

(1) 

t  ratios  are 

between  the  means 

of  the  white 

and  Negro 

samples 

J00 


Table  50 

r  Criteria-  Means 

,  Standard 

Deviations, 

N»s 

and 

Tests  of  Significance  of  Mean  Difference 

s 

Miscellaneous  Clerical 

Group 

X 

£ 

N 

tU) 

Accuracy 

Total 

4.31 

.91 

130 

White 

4.25 

.91 

106 

Negro 

4.54 

.811 

24 

1.41 

Accuracy 

Total 

4.13 

.99 

130 

Under 

Pressure 

White 

4.0  6 

.99 

106 

Neg  ro 

4.46 

•  93 

24 

1.21 

Work 

Total 

4.l6 

1.04 

130 

Speed 

White 

4.19 

.99 

100 

Negro 

4.0'i 

1.27 

24 

.63 

Learning 

Tota  l 

4. 12 

.82 

130 

Ability 

Wh  1  to 

4.  ll* 

.79 

106 

Negro 

4 .  Dll 

•  97 

24 

.21 

Human 

Total 

4.27 

•  93 

130 

Relations 

While- 

4.26 

•  91 

10C 

Negro 

4.29 

1,04 

24 

.14 

Overall 

Total 

4.28 

.90 

130 

Effectiveness 

Whl  to 

4.23 

.88 

106 

Neg  ro 

4.;>4 

.98 

24 

1.92 

^  ^t  ratios  are  between  the  means  of 

the  white  and  Negro  samples 
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Because  the  rating  criteria  were  confounded  with  differential 
tenure  for  the  two  racial  groups,  correlations  between  tenure  and 
ratings  were  computed.  The  nonsigni f leant  correlations  obtained  in¬ 
dicated  that  the  mean  criterion  ratings  for  the  Negro  sample  would  not 
have  increased  substantially  if  they  had  been  on  the  .job  as  long 
as  the  white  sample. 

Validity 

Validity  coefficients  for  the  total,  white  and  Negro  samples 
are  presented  in  Table  51.  The  most  striking  characteristic  of  this 
analysis  was  the  general  lack  of  validity  exhibited  by  the  predictors 
for  either  racial  group.  The  heterogeneity  of  Job  classifications  inclu¬ 
ded  in  this  sample  may  have  been  a  major  factor  contributing  to  this 
general  lack  of  validity. 

Only  two  predictors  show  validity  for  the  racial  subgroups. 

The  Arithmetic  Reasoning  Test  predicted  ratings  of  Accuracy  and  Teaming 
Ability  for  Negro  employees  but  not  for  white  employees.  Clerical 
Test  II,  on  the  other  hand,  predicted  ratings  of  Accuracy  and  Work 
Speed  for  white  employees,  but  not  for  Negro  employees. 

Models  Illustrated 

Model  5,  the  only  Model  illustrated  i.n  this  sample,  was  repre¬ 
sented  five  times.  The  number  in  parentheses  below  the*  correlation  for 
the  Negro  sample  in  Table  51  indicates  the  relationship  represented 

Although  the  performances  of  the  white  and  Negro  samples  were 
approximately  equal  on  all  the  predictor  and  criterion  measures,  the 
Arithmetic  Reasoning  test  was  a  valid  predictor  of  ratings  of  Accuracy 
and  Learning  Ability  for  the  Negro  sample  only.  Tims,  the  test  may 
be  used  with  some  confidence  to  select  Negroes  but  ls  not  appropriate 
for  the  selection  of  white  employees.  In  contrast,  Clerical  TT  (R-W,  10  m:n.) 
predicted  ratings  of  Accuracy  and  Work  Speed  for  the  white  sample  but 
not  the  Negro  sample.  Likewise,  Clerical  II  (10  min.)  is  a  valid  predictor 
of  Work  Speed  for  the  white  sample  only. 

Applying  the  additional  criterion  of  a  significant  difference 
between  validity  coefficients  for  the  two  racial  groups  eliminated  these 
five  examples  of  Model  5* 

Because  only  a  few  validity  coefficients  reached  a  statistically 
significant  level,  caution  should  be  exercised  in  the  interpretation  of  this  study. 
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Total  Group,  Whites,  and  Negroes 
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A  rather  large  number  of  correlations  were  examined  and  some  sta¬ 
tistically  significant  coefficients  would  be  expected  by  chance. 

Table  52  presents  the  results  of  the  regression  tests  of  the 
analysis  of  covariance  (Potthoff,  1966).  All  of  the  F  ratios  were 
not  significant  indicating  that  no  bias  was  present. 
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Study  7 :  Keypunch  Operators 


Sample 

Study  7  consists  of  135  keypunch  operators  of  whom  107  were  white  and 
28  were  Negro.  As  shown  in  Table  53,  the  two  ethni .  groups  were  approximately 
equal  in  terms  of  age,  but  white  employees  had  longer  company  service.  Again, 
Negro  employees  have  a  significantly  higher  educational  level  as  compared  to 
their  white  counterparts. 

Table  53:  Biographical  Data-Keypunch  Operators 


- 

O  ) 

Group 

X 

£ 

N 

t 

Age 

Total 

26.12 

8.97 

135 

White 

26.26 

9.7k 

107 

Negro 

25.57 

5.13 

28 

.50 

Tenure 

Total 

2k .  03 

26.29 

135 

( Months ) 

White 

25. 5!* 

28.79 

107 

Negro 

18.25 

10.36 

28 

2.12* 

Education 

Total 

11.82 

1.09 

135 

{ in  years ) 

White 

11.75 

1.18 

107 

Negro 

12.11 

.57 

28 

2.27* 

(l)  t  -ratios 

are  between 

means  of  the 

white  and  Negro 

samples . 

*p  <  .05 

Predictor  Comparisons 

Scores 

on  four  predictor  measures 

;  were  obtained. 

The  first  was 

a  company 

developed  te 

st  of  mental 

alertness.  1 

ising  this  measure  two  subset  res 

were 

obtained — a 

verbal  and  a 

quantitative 

score.  Tiie  sum 

of  these  two  scores  pro- 

vided  a  measure  of  general  mental  alertness. 

Secondly,  the  Thurstone  Temperament  Schedule  was  administered.  This 
personality  inventory  is  designed  to  measure  the  following  seven  aspects  of 
temperament ; 

Active  (a)  Emotionally  Stable  (Es) 

Vigorous  (V)  Sociable  (S) 

Impulsive  (l)  Reflective  (R) 

Dominant  (D) 
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Table  54:  Predicfcors-Means ,  Standard  Deviations,  N's,  and  Tests 
of  Significance  of  Mean  Dlfforenees-Keypunch  Operators 


Predictor 

Test  of 

Group 

X 

N 

_t 

Mental  Alertness 

Verbal 

Total 

34.10 

13.68 

128 

White 

34.60 

14.37 

104 

Negro 

31-96 

10. n 

24 

.84 

Quant. 

Total 

17.70 

6  39 

128 

White 

18.04 

6.79 

104 

Negro 

16.21 

4.02 

24 

I.?' 

Total 

Total 

51.77 

17.40 

128 

White 

52.63 

18.45 

104 

Negro 

48.17 

11.44 

24 

1.49 

Olerl  cal 

Total 

115.77 

20.32 

128 

White 

116.41 

20.04 

104 

Negro 

U3.00 

21.74 

24 

•  73 

Clerical 

Total 

107.19 

21.53 

328 

(R-W) 

White 

108.49 

20.43 

104 

Negro 

301,54 

25.51 

24 

J  .42 

Arlthmet  Ic 

Total 

26.72 

7.28 

128 

Whl  te 

27.15 

7-59 

104 

Negro 

24.83 

6.60 

24 

1.40 

Arithmetic 

Total 

22,62 

8.38 

128 

(H-W) 

White 

23.18 

8.49 

104 

Negro 

20.17 

7.56 

24 

1.59 

107 


Table  54:  Continued 


Predictor 

Group 

X 

N 

t 

tone  Temperament 

Schedule 

Act  Ive 

Total 

9-36 

3.05 

107 

White 

9.60 

3.10 

88 

Negro 

8.21 

2.59 

19 

1.81 

Vigorous 

Total 

7.74 

3.20 

07 

White 

7.67 

3.39 

88 

Negro 

8.11 

2.16 

19 

•  70 

Impulsive 

Total 

11.12 

3.27 

107 

White 

11.25 

3.33 

88 

Negro 

10.53 

3.06 

19 

.86 

Dominant 

Total 

9-55 

4.67 

107 

White 

9.40 

4.53 

88 

Negro 

10.26 

5-35 

19 

•  72 

Emotionally 

Total 

11.53 

3.31 

107 

Stable 

Whi  te 

11.50 

3.36 

88 

Negro 

11.68 

3.15 

19 

.22 

Soc  '.able 

Total 

13.05 

3.36 

107 

Wh  1  te 

13.06 

3.44 

88 

Negro 

13.00 

3.02 

19 

.07 

Reflect  Sve 

Total 

6.99 

3. 13 

107 

7: 'to 

7.07 

3.3? 

88 

Negro, 

6.63 

'•67 

19 

.82 

(1)  t  ratios  ar~ 

between  the  means 

>>l‘  the  white 

and  Negro 

samples . 

108 


Two  company  developed  tests  were  also  included.  The  Clerical  Aptitude 
Test  is  a  measure  of  perceptual  speed  and  accuracy  while  the  Arithmetic  Skills 
Test  is  a  measure  of  the  ability  of  the  employee  to  check  the  accuracy  of 
simple  arithmetic  problems.  Two  scoring  procedures  were  used  with  the 
Arithmetic  and  Clerical  Tests:  (l)  the  number  of  correct  responses,  and 
(2)  the  number  correct  minus  the  number  of  incorrect  responses. 

Mean  scores  for  the  two  racial  groups  are  presented  in  Table  51*-  There 
wr re  no’  significant  differences  between  the  performance  of  the  two  racial 
groups  on  any  of  the  predictors. 

Criterion  Comparisons 

Employees  were  rated  by  their  immediate  supervisor  in  committee  with  their 
department  head  on  a  company-developed  rating  scale.  This  nine-point  rating 
scale  covered  the  following  dimensions:  (l)  Concentration,  (2)  Learning 
Ability,  (3)  Work  Sharing,  (U)  Error  Detection,  (5)  Social  Interaction, 
and  (6)  Overall  Effectiveness.  Two  objective  criteria  were  also  available: 
Keypunching  Speed  and  Error  Percentage. 

The  raw  ratings  were  converted  to  standard  scores  within  raters  in  all 
cases  where  sufficient  numbers  of  people  were  rated  by  a  pair  of  raters.  This 
was  an  attempt  to  compensate  for  errors  of  leniency  and  central  tendency. 

Mean  criterion  scores  for  the  two  racial  groups  are  presented  In  Table  55. 
Considering  the  standardized  criteria  we  find  thut  white  employees  obtained  higher 
ratings  than  Negro  employees  on  Concentration.  No  significant  differences 
existed  between  the  two  racial  groups  on  any  of  the  other  standardized  cr.teria. 

White  employees  obtained  higher  ratings  than  Negro  employees  on  two  raw 
score  ratings:  Error  Detection  and  Social  Interaction.  No  mean  score  dif¬ 
ferences  existed  between  the  two  racial  groups  on  the  standardized  objective 
criteria. 

Validity 

Validity  coefficients  for  the  two  racial  groups  are  presented  in  Table 
5 6.  All  correlations  have  been  controlled  for  tenure  when  appropriate.  In 
general,  the  coefficients  were  rather  low.  The  most  promising  tests  were  the 
Arithmetic  and  Clerical  tests  developed  by  the  firm's  psychologists. 

The  most  predictable  criteria  were  ratings  of  Learning  Ability  and  the 
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Table  55:  Criteria -Means,  Standard  Deviations,  N«s,  and  Tests 
of  Significance  of  Mean  Dlfferenccs-Xeypunch  Operators 


Criterion 

Group 

X 

N 

t(0 

Ratings  -  Standardized 

Concentration 

Total 

50.29 

9.95 

131 

White 

51.26 

10.2  1 

103 

Negro 

46.72 

7.98 

28 

2.16* 

Learning  Ability 

Total 

49.95 

9.70 

131 

White 

50.17 

9.63 

103 

Negro 

49.18 

10.08 

28 

.4? 

Work  Sharing 

Total 

49.92 

9-93 

131 

White 

50.47 

9.90 

103 

Negro 

47.91 

9.95 

28 

1.20 

Error  Detection 

Total 

50.03 

9-93 

151 

Wh  t  to 

50.66 

10.09 

i05 

Negro 

47.71 

9.2  5 

-28 

1.59 

Social  Interaction 

Total 

50.58 

10.12 

151 

White 

51.25 

10.28 

103 

Negro 

47.19 

8.97 

28 

1 . 80 

General  Overall 

Total 

49.98 

9-96 

!  31 

Effect 

Whl  tc 

50.39 

10.33 

103 

Negro 

48.50 

8.49 

28 

.88 

Ratings  -  Raw  Score 

Concentration 

Total 

6.01 

1.47 

131 

White 

6.03 

1.58 

103 

Negro 

5-93 

•  98 

28 

.43 

Learn'ng  Ab'iity 

Total 

5-75 

I.71 

131 

Wh te 

5-7? 

1.68 

103 

Negro 

5.82 

1.87 

28 

.25 

110 


Table  55:  Continued 


Criterion 

Group 

X 

£ 

H 

Work  Sharing 

T  tal 

5.65 

1.47 

131 

White 

5.76 

l.d5 

105 

Negro 

5.21 

>•57 

28 

J.73 

Error  Detection 

Total 

5.62 

1.79 

151 

White 

5.04 

I.69 

105 

Negro 

4.82 

1 .96 

,’8 

2 . 70** 

Social  Interaction 

Total 

5.6? 

J  -  75 

151 

White 

5-95 

1.67 

105 

Negro 

4.68 

J.70 

28 

5.55** 

General  Overall 
Effect i veness 

Total 

5.53 

1-57 

131 

Wh  Ite 

5.64 

1.62 

103 

Negro 

5.3  4 

3.55 

28 

1.48 

Standardized  Objective 

Cr I  ter la 

Speed 

Total 

4 9-99 

9.95 

100 

Wh !  to 

"7.79 

10.  1 5 

79 

Negro 

50.77 

9.55 

2  1 

.40 

Error  Percentage 

Tota  1 

4  9.84 

10.  oy 

9" 

Wh  I  to 

50.1-5 

10.75 

76 

Negro 

"7.28 

6.  51 

18 

!  .(>  1 

(!)  t  ratios  are  between  the  moans  of  the  white  and  Negro  sr.-.nplon 
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.’able  56:  rredIctr’-Critericr.  Ccrrelat^ons-Keypunch  Operators 
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least  predictable  criteria  were  the  objective  measures  of  Keypunching  Speed 
and  Error  Percentage. 

Models  Illustrated 

Only  two  different  models  were  illustrated  in  this  sample.  The  number  in 
parentheses  below  the  correlation  for  the  Negro  sample  in  Table  5 6  indicates 
the  specific  model  illustrated.  Model  5*  the  most  frequently  illustrated 
model*  was  represented  in  2*4  predictor-criterion  relationship:-..  Model  5 
illustrates  the  situation  where  there  is  no  difference  on  either  Llie  predictor 
or  criterion  for  the  two  racial  subgroups  and  the  test  is  valid  only  for  one 
subgroup.  The  relationship  between  ratings  of  Work  Sharing  and  the  Emotional 
Stability  scale  of  the  Tnurstone  Temperament  Schedule  clearly  illustrated  this 
model. 

The  final  model  illustrated  in  this  sample  was  model  6.  Eight  illustrations 
of  this  model  occurred  but  it  was  most  clearly  illustrated  in  the  relationship 
between  the  Clerical  test  and  the  raw  score  ratings  of  Error  Detection.  The 
mean  test  performance  was  approximately  equal  for  the  two  groups  on  both  forms 
of  the  Clerical  test  but  the  white  employees  were  rated  higher  on  Error  Detection. 
The  validity  coefficient  was  significant  only  for  the  white  sample.  Total 
group  validation  procedures  would  recommend  the  use  of  the  test  for  selection 
even  though  the  test  is  clearly  not  appropriate  for  the  Negro  sample. 

The  frequency  of  the  various  models  was  greatly  reduced  when  the  additional 
criterion  of  a  significant  difference  between  validity  coefficients  was  applied. 
Only  four  illustrations  of  Model  5  met  this  criterion.  Tne  superscript  a  in 
Table  56  indicates  those  models  which  met  this  criterion., 

'fable  57  presents  the  results  of  the  regression  tests  of  the  analysis  of 
covariance  (Potthoff,  1966).  A  significant  F^  statistic  was  obtained  in  a 
large  number  of  the  comparisons  of  the  predictors  with  the  raw  score  ratings 
of  Social  Interaction.  A  significant  statistic  indicates  that  a  common 
intercept  value  could  not  be  used  for  the  two  ethnic  groups.  Only  four 
significant  statistics  were  illustrated.  A  significant  statistic  indicates 
that  a  common  beta  weight  could  not  be  used  for  the  two  ethnic  groups. 
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Table  57-  Analysis  of  Covariance  for  Homogeneity  of  Regression 
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Section  IV:  Sunancry  and  Discussion 

Any  attempt  to  summarize  the  data  presented  in  the  proceeding  seven 
studies  is  necessarily  open  to  question  and  limited  by  the  very  nature 
of  the  data.  Since  a  basic  purpose  of  this  research  project  was  to  obtain 
an  estimate  of  the  parameters  of  subcultural  differences  in  the  prediction 
of  job  performance,  predictor-criterion  relationships  across  studies  were 
examined  with  respect  to  type  of  valid  predictor,  type  of  predictable 
criterion,  and  type  of  subgroup  for  which  *  predictor  was  valid.  Several 
assumptions  about  the  data  were  made  before  these  comparisons  were  attempted. 
First,  within  each  study  each  predictor-criterion  relationship  was  treated 
as  if  it  were  independent  of  all  other  predictor-criterion  relationships. 
Thus,  the  intercorrelations  of  the  predictor  set  and  the  criterion  set  were 
ignored.  Secondly,  no  attempt  was  made  to  weight  the  results  of  a  study 
by  the  sample  size  of  the  study.  This  served  to  place  the  emphasis  on  the 
statistical  significance  of  a  result  rather  than  its  absolute  magnitude. 

This  is  consistent  with  the  decision  that  primary  attention  should  be  paid 
to  the  significance  of  validity  coefficients  when  comparing  different  ethnic 
subgroups  because  of  its  practical  implications. 

All  samples  in  the  seven  studies  consisted  of  current  employees.  Thi-s, 
data  were  not  available  for  the  applicant  populations.  A  further  assumption 
that  had  to  be  made,  therefore,  was  that  the  current  employees  in  all  ethnic 
subgroups  were  representative  of  their  respective  subgroup  applicant  popula¬ 
tion  witn  respect  to  the  predictor-criterion  relationships.  Finally,  the 
assumption  was  made  that  there  was  no  bias  in  the  criterion  measures.  Un¬ 
fortunately,  no  estimates  of  Buch  bias  were  available;  therefore,  all  sub¬ 
group  differences  and  lack  of  differences  on  criteria  were  assumed  to  be 
a  function  of  actual  subgroup  job  performance. 


These  assumptions,  in  addition  to  the  fact  that  small  sample  sizes 
permitted  only  a  single  estimate  of  each  predictor-criterion  relationship 
to  be  made,  lead  to  somewhat  equivocal  statements  in  summarizing  the  data. 
All  of  these  assumptions  and  restrictions  ouBt  be  considered  when  attempt¬ 
ing  to  generalize  from  these  data. 

Table  58  presents  a  summary  of  predictor  mean  subgroup  differences  and 
validity  with  respect  to  type  of  predictor.  It  can  be  seen  from  Table  58 
that  the  white  subgroup  (W)  scored  significantly  than  the  non-white 

subgroup  (N,  either  ITegro  or  Latin  American)  on  approximately  one-fourth 
of  the  predictors.  It  should  be  remembered  that  a  subgroup  mean  difference 
on  a  predictor  does  not  necessarily  indicate  that  the  predictor  is  biased 
against  one  of  the  subgroups.  If  the  difference  on  the  predictor  is  assoc¬ 
iated  with  a  corresponding  difference  on  the  criterion  measure,  the  predic¬ 
tor  may  not  be  biased,  but  rather  may  be  reflecting  a  difference  in  criter¬ 
ion  performance.  Table  59  presents  the  instances  of  unfairness  with  respect 
to  type  of  predictor.  Unfairness  may  exist  when  a  difference  on  either  the 
predictor  or  criterion  measure  is  not  associated  with  a  corresponding  sub¬ 
group  difference  on  the  other  measure.  From  Table  59  it  can  be  seen  that 
the  type  of  test  ur-**  frequ^tly  {’i  terms  of  percentage  of  total  compari¬ 
sons)  associated  with  instances  of  unfairness  was  the  non-verbal  intelli¬ 
gence  teat.  This  type  of  test  failed  to  predict  a  criterion  difference  75$ 
of  the  time-  The  type  of  test  which  fared  best  with  regard  to  unfairness 
waa  the  perceptual  test.  When  a  perceptual  test  was  the  predictor,  there 
was  no  unfairness  in  84$  of  the  predictor-criterion  comparisons. 

The  concept  of  unfairness  does  not  irvolve  the  validity  of  the  predic¬ 
tor.  Of  course,  both  fairness  and  validity  are  desirable  attributes  of  a 
predictor.  In  the  right  half  of  Table  58,  the  validity  patterns  of  the 
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predictors  with  respect  to  type  of  predictor  are  presented.  A  most  striking 
fact  evident  from  Table  58  is  the  large  proportion  of  instances  where  the 
predictor  was  valid  for  only  one  of  the  subgroups.  In  particular,  predic¬ 
tors  were  valid  for  only  the  white  subgroup  237  tides  (of  a  total  of  765 
predictor-criterion  comparisons)  and  valid  for  only  the  non-white  subgroup 
52  times.  This  contrast  of  frequency  of  subgroup  validity  lends  support 
to  the  commonly  held  hypothesis  that  tests  tend  to  be  valid  for  white  per¬ 
sons  but  not  for  minority  group  members.  It  must  be  remembered,  however,  that 
the  sample  sizes  of  white  and  Negro  subgroups  were  quite  dissimilar  and  a 
smaller  correlation  in  terms  of  magnitude  was  required  for  significance  with 
the  white  subgroups.  The  perceptual  tests  again  were  superior  when  validity 
was  considered,  being  valid  for  at  least  one  subgroup  in  about  two-thirds  of 
the  total  comparisons  and  being  valid  for  both  subgroups  in  approximately 
one-fourthof  the  instances.  The  superiority  of  the  perceptual  type  of  test 
with  respect  to  validity  was  not  surprising,  6ince  most  of  the  samples  con¬ 
sisted  of  clerical  workers. 

Table  60  presents  criteria  mean  subgroup  differences  and  criterion  pre¬ 
dictability  summarized  over  the  seven  studies.  The  white  subgroup  scored 
significantly  higher  on  about  one-fourthof  the  criterion  measures,  and  there 
were  no  differences  on  the  rest.  Table  Cl  presents  Instances  of  unfairness 
with  respect  to  type  of  criterion.  The  predictability  of  each  type  of  crit¬ 
erion  measure  is  given  in  the  right  half  of  Table  60. 

Since,  in  all  instances  where  either  predictor  or  criterion  subgroup 
mean  differences  were  found,  the  white  subgroup  3eored  higher  on  the  measure 
than  the  non-white  gubgroup,  certain  consistent  results  concerning  unfair¬ 
ness  were  found.  When  the  difference  in  mean  subgroup  performance  was  on 
the  predictor  variable  only,  the  non-white  subgroup  would  be  discriminated 
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f  V  >■//»>/ 

4 

Promotion 

1  bo%) 

0  (  0%) 

l  ( 50> ) 

Objective 

2  (  6%) 

0 

N — 

0 

30  ( y4‘» ) 

Rating 

"5  (12%) 

i22  (l6fi) 

510  (72'i) 

717 

Test  Score  Change 

1  (100%) 

lo 

0 

_ _ 0  (  0%) 

1 

Total 


9'}  {13p) 


121  fife) 


•\4<  .  I 


against  if  selection  were  made  using  a  common  regression  equation.  In 
those  cases  where  the  difference  was  on  the  criterion  only,  the  white  sub¬ 
group  would  be  discriminated  against  if  the  common  regression  line  were 
used.  Thus,  the  non-white  subgroup  was  discriminated  against  in  13%  of  the 
instances  reported  in  this  investigation  and  the  white  subgroup  in  lb#  of 
the  instances,  if  the  criterion  of  unfairness  as  defined  previously  is  used 
to  determine  discrimination.  An  examination  of  Table  61  reveals  that  a 
rating  criterion  is  involved  in  all  cases  of  unfairness  against  the  white 
subgroup.  Any  conclusion  reached  with  only  a  rating  criterion  is  equivocal 

All  predictor-criterion  relationships  were  also  analyzed  to  determine 
the  frequency  of  occurrence  of  the  eleven  different  relationships  presented 
in  the  Bartlett  and  O'Leary  (1969)  model.  Table  b2  presents,  by  sample, 
the  frequency  of  each  model. 

Clearly,  the  model  most  often  illustrated  was  Model  5  (no  differences 
on  criterion  or  predictor,  but  differential  validity).  This  is  not  surpris¬ 
ing  since  in  a  large  number  of  the  predictor-criterion  relationships  both 
racial  groups  performed  equally  well  on  both  the  predictor  and  criterion, 
and  thus  a  significant  correlation  in  either  sample  would  produce  a  Model  t>. 

It  is  important  to  note  that  in  a  majority  of  the  illustrations  of  to is 
model,  the  test  was  valid  for  the  white  sample  and  not  valid  for  the  minority 
sample. 

It  is  unlikely  that  these  cases  would  produce  any  differential  selection 
rates  for  the  ethnic  groups  since  there  was  no  difference  in  mean  test  per¬ 
formance  for  tag  two  groups.  Thus,  viewed  in  terms  of  equal  oDportunity, 
these  models  do  not  appear  to  illustrate  bias.  However,  subsequent  mean  job 
performance  for  the  two  groups  vouJd  be  discrepant,  and  one  might,  erroneously 
conclude  that  the  minority  sample's  ability  to  perform  on  the  Job  was  inferior 


Table  60  ; 

Criterion 


Mean  Differences  and  Predictability  with  Respect  to 
Type  of  Criterion 

Mean Differences  Predictable  For 


Attendance 

Termination 

Extension  of 
Probation 

Promotion 

Objective 

Rating 

Test  Score 
Change 


Total 


No 


W>H 

W<N 

Diff 

Total 

W(only) 

N(only) 

Both 

Neither 

Total 

0 

0 

5 

5 

0 

2 

0 

5 

7 

0 

0 

1 

1 

!  0 

0 

0 

2 

1 

0 

2 

3 

1 

0 

0 

4 

1 

0 

1 

2 

0 

0 

0 

O 

C. 

2 

0 

0 

4 

4 

2 

2 

0 

28 

32 

16 

0 

53 

71 

233 

48 

136 

300 

717 

_ q 

0 

1 

•* 

-A- 

1 

0 

0 

0 

1 

20 

0 

67 

87 

237 

52 

136 

340 

765 

Table  6l  :  In 

stances  of  Unfairness  with  Respect 

to  Type  of  Criterion 

Type  of  Criterion 

Instances  of  Unfairness 

Differences 

on  Only 

No 

Predictor 

Criterion 

Unfairness 

Total 

Attendance 

5  (71$) 

0  (  0#) 

2  (291#) 

7 

Termination 

1  (5056) 

0  (  0#) 

1  (50#) 

2 

Extension  of  Probation 

2  (50*) 

0  (  0#) 

2  (50#) 

4 

Promotion 

1  (50?,) 

0  (  0#) 

1  (50#) 

<- 

Objective 

?  (  6*) 

0 

0 

-as. 

30  (94#) 

3? 

Rating 

o5  (12#) 

122  (16#) 

510  (72#) 

717 

Test  Score  Change 

1  (100#) 

0  (  0%) 

0 

O 

1 

iOofij. 

97  (13#) 

122  (16#) 

540  (71#) 

Y<>5 

127 


against  if  selection  were  made  using  a  common  regression  equation.  In 
those  cases  where  the  difference  was  on  the  criterion  only,  the  white  sub¬ 
group  would  be  discriminated  against  if  the  common  regression  line  were 
used.  Thus,  the  non-white  subgroup  was  discriminated  against  in  13%  of  the 
instances  reported  in  this  investigation  and  the  white  subgroup  in  16^  of 
the  instances,  if  the  criterion  of  unfairness  as  defined  previously  is  used 
to  determine  discrimination.  An  examination  of  Table  61  reveals  that  a 
rating  criterion  is  involved  in  all  cases  of  unfairness  against  the  white 
subgroup.  Any  conclusion  reached  with  only  a  rating  criterion  is  equivocal. 

All  predictor-criterion  relationships  were  also  analyzed  to  determine 
the  frequency  of  occurrence  of  the  eleven  different  relationships  presented 
in  the  flartlctt  and  O’Leary  (1969)  model.  Table  b2  presents,  by  sample, 
the  frequency  of  each  model. 

Clearly,  the  model  most  often  illustrated  was  f»'~del  5  (no  differences 
on  criterion  or  predictor,  but  differential  validity).  This  is  not  surpris¬ 
ing  since  in  a  large  number  of  the  predictor-criterion  relationships  both 
racial  groups  performed  equally  well  on  both  the  predictor  and  criterion, 
and  thus  a  significant  correlation  in  either  sample  would  produce  a  Model 
It  is  important  to  note  that  in  a  majority  of  the  illustrations  of  this 
model,  the  test  was  valid  for  the  white  sample  and  not  valid  for  the  minority 
sample. 

It  is  unlikely  that  these  cases  would  produce  any  differential  selection 
rates  for  the  ethnic  groups  since  there  was  no  difference  In  mean  test  per¬ 
formance  for  the  two  groups.  Thus,  viewed  in  terms  of  equal  opportunity, 
these  models  do  not  appear  to  illustrate  bias.  However,  subsequent  mean  job 
performance  for  the  two  groups  would  be  discrepant,  and  one  might  erroneously 
conclude  that  the  minority  sample’s  ability  to  perform  on  the  job  vaa  inferior 
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to  that  of  the  white  sample.  This  mean  difference  in  criterion  performance 
would  be  a  direct  result  of  an  inappropriate  selection  procedure.  The 
only  solution  to  the  selection  problems  of  Model  ^  appears  to  oe  to  use 
the  test  for  which  it  is  valid  and  to  search  for  other  valid  predictors  for 
the  non-valid  group. 

In  view  of  the  relatively  high  frequency  of  this  model,  it  would  seem 
that  mors  research  should  be  directed  toward  the  development  of  valid  pre¬ 
dictors  for  minority  populations.  An  examination  of  Table  ^8  (page  !  "  ) 
reinforces  this  belief  since  in  a  large  number  of  the  total  predictor-criterion 
relationships,  the  test  was  valid  only  for  the  white  sample. 

The  second  most  frequently  occurring  model  was  Model  0  (mean  difference 
on  criterion  only  and  differential  validity).  In  all  illustrations  of  this 
mcdel,  the  white  sample  ottained  higher  ratings  of  Job  performance  while 
tlere  was  no  difference  in  test  performance  for  the  two  groups.  The  use  of 
a  common  regression  line  would  always  result  in  an  over-prediction  of  Job 
performance  for  the  minority  group.  Thus,  this  model  does  not  deny  oppor¬ 
tunity  to  minority  group  members.  In  fact,  it  systematically  provides  oppor¬ 
tunity  to  minority  groups.  It  is  unlikely,  however,  that  such  over-prediction 
would  benefit  the  minority  group  members  in  the  long  run.  It  is  likely  to 
lead  only  to  temporary  employment  since  the  minority  group  member  would  have 
a  low  probability  of  success  on  the  Job. 

It  is  also  imnortant  to  note  that  if  *  common  regression  line  were 
employed,  one  would  under-predict  Job  perfcrmance  for  the  white  sample  and 
thereby  systematically  reject  qualified  white  applicants.  This  model  illus¬ 
trates  the  fact  that  not  all  bias  is  against  minority  groups. 

Model  Y  was  the  third  most  frequently  occurring  model  (mean  difference 
on  predictor  only  and  differential  validity).  As  was  the  case  with  Model  S, 
in  most  illustrations  of  this  model  the  test  was  valid  only  for  th''  white 


sample.  However,  since  the  minority  group  scored  lower  on  the  predictor, 
utilization  of  f  is  test  in  selection  is  more  detrimental  to  the  minority 
group  member  than  is  Model  5*  Because  there  was  a  difference  in  mean  test 
performance,  the  minority  group  member  has  less  of  an  opportunity  to  be 
selected.  But,  perhaps  more  important  is  the  fact  that  by  using  such  a 
test,  one  is  systematically  denying  opportunities  to  minority  group  members 
on  ~he  basis  of  a  non-v(,iid  test. 

Another  clear  illustration  of  unfair  discrimination  is  represented  in 
Model  2  (mean  difference  on  predictor  only  but  equal  subgroup  validities). 

In  all  illustrations  of  this  model,  opportunity  would  be  denied  to  minority 
group  members  since  they  score  lower  on  the  test,  but.  perform  as  well  as  the 
white  sample  on  the  job.  Since  the  test  is  valid  for  both  groups,  differen¬ 
tial  prediction  is  a  solution  to  the  problem.  Separate  regreasion  lines 
and  separate  expectancy  tables  for  minority  and  white  samples  would  elimi¬ 
nate  the  unfair  discrimination  in  this  model. 

Occurring  as  frequently  an  Model  P  war.  Model  i  (difference  on 

both  predictor  and  criterion  and  differential  validity).  Since  there  is  a 
differential  in  both  the  predictor  and  criterion  performance  for  the  two 
ethnic  groups,  one  would  expect  a  difference  in  selection  rates.  Valid  pre¬ 
dictions  can  be  made  because  the  test  identifies  the  lower  performing  minor¬ 
ity  group  members.  Nonetheless,  the  test  is  certainly  not  appropriate  for 
prediction  within  the  non-white  sample. 

The  development  of  a  valid  predictor  of  Job  performance  for  minority 
group  members  will  not  eliminate  the  differential  in  selection  rates  since 
the  minority  group  members  do  not  perform  as  well  as  the  white  individuals 
on  the  job.  However,  a  valid  predicto-  for  the  non-white  sample  will  insure 
that  the  most  qualified  minority  group  members  will  t>e  selected. 


Model  3  (difference  on  criterion  only,  equal  subgroup  validities), 
occurring  18  times,  illustrates  a  situation  where  Job  performance  is  over¬ 
predicted  for  the  non-white  sample.  Again,  Job  opportunity  is  not  denied 
minority  group  members.  In  this  instance,  the  bias  is  against  the  white 
sample.  Separate  regression  lines  and  expectancy  tables  will  eliminate 
this  inequality. 

Perhaps  the  most  important  finding  of  this  phase  of  the  research  pro¬ 
ject  is  the  fact  that  Model  1  (no  difference  on  predictor  or  criterion, 
equal  subgroup  validities)  occurred  so  infrequently.  Traditional  person¬ 
nel  selection  procedures  assume  that  Model  1  is  operative  (i.e.,  a  single 
regression  line  can  be  used  for  all  subgroups  in  a  population).  The  results 
of  this  study  indicate  that  the  traditional  model  is  inappropriate  in  most, 
cases.  Homogeneous  populations  are  the  exception  rather  than  the  rule. 

Thus,  it  is  imperative  that  tests  be  validated  separately  for  subgroups  in 
a  population  if  inadvertant  discrimination  is  to  be  avoided. 

Models  10  (difference  on  predictor  only,  opposite  subgroup  validities) 
and  11  (differences  on  both  predictor  and  criterion,  no  subgroup  validity) 
occurred  relatively  infrequently  (4  and  1  times,  respectively),  while 
Models  4  (difference  on  both  predictor  and  criterion,  equal  subgroup  valid¬ 
ity)  and  9  (no  differences  on  predictor  or  criterion  but  opposite  validity } 
did  not  appear.  This  would  tend  to  indicate  that  these  models  are  probably 
rare  and  are  not  contributing  a  significant  amount  to  inadvertant  discrimina¬ 
tion  in  testing. 

Two  separate  methods  of  model  identification  were  utilized  in  those 
situations  where  differential  validity  was  demonstrated  for  the  two  racial 
groups  (Models  5-10).  The  above  summary  of  the  relative  frequency  of  the 
various  models  utilized  the  first  method  of  model  identification.  All 


predictor-criterion  relationships  in  which  a  validity  coeli'icient  was 
significant  for  one  racial  group  but  not  significant  for  the  other,  were 
identified  as  models  using  this  method. 

The  second  method  of  model  identification  imposed  an  additional  cri¬ 
terion  of  a  statistically  significant  difference  between  the  validity 
coefficients  for  the  two  racial  groups.  Table  (.>3  presents  a  comparison 
of  the  relative  frequency  of  each  model,  using  the  two  methods  of  model 
identification. 


Table  63 

Frequency  of  Models  Illustrated 


Method  of  Model  Identification 


Model 

1 

2 

3 

4 

b 

6 

7 

8 
9 

10 

11 


Total  Occurrences 
lo 
28 
18 
0 

103 

60 

39 

21: 

0 

4 

1 


Significant  Differences 
lb 
28 
18 
0 
lb 
y 
lo 
4 
0 
4 
1 


(1) 


(l)  Using  the  second  method  of  model  identification,  Models  b 
through  10  require  a  significant  difference  between 
validity  coefficients  for  the  two  ethnic  groups  to  o<* 
included  as  an  illustration  of  that  model. 


As  can  be  seen  in  Table  63,  the  relative  frequency  of  the  various 
models  was  greatly  reduced  using  this  additional  criterion  of  model  identi¬ 
fication.  However,  it  is  important  to  note  that  even  with  this  more 
stringent  criterion,  inadvertant  test  bias  was  demonstrated  in  over  25% 
of  the  relationships. 

Throughout  the  report  we  have  identified  those  models  which  met  the 
first  criterion  and  those  which  met  both  criteria.  Greater  emphasis,  how¬ 
ever,  has  been  placed  on  the  first  method  of  model  identification  because 
of  its  practical  implications.  That  is,  it  is  difficult  to  Justify  using 
a  test  for  a  given  subgroup  where  it  does  not  correlate  significantly  with 
the  criterion,  despite  the  fact  that  the  correlation  may  not  differ  signif¬ 
icantly  from  a  valid  correlation  for  another  subgroup  of  the  population. 

Each  predictor-criterion  relationship  was  also  analyzed  using  the 
regression  teats  of  the  analysis  of  covariance  (Potthoff,  1966)  to  test 
the  equality  of  slopes  arid  intercepts  for  the  ethnic  groups.  In  general, 
the  results  of  this  analysis  were  similar  to  the  second  method  of  model 
identification.  However,  the  analysis  of  covariance  method  identified 
regression  intercept  differences  even  in  those  cases  where  the  test  posses¬ 
sed  no  validity  for  either  subgroup. 

Table  64  presents  the  frequency  of  the  various  models  for  each  of  the 
six  general  classifications  of  pre  ^:tor  variables.  As  can  be  seen  in  the 
the  table,  it  is  not  possible  to  predict  which  type  of  test  is  likely  to 
produce  a  certain  model.  That  is,  no  model  was  clearly  associated  with 
a  particular  type  of  test.  Although  the  perceptual  tests  illustrate  the 
most  models,  they  were  also  the  moot  frequently  utilized  test,  since  most 
jobs  were  clerical  in  nature.  The  non-verbal  I.Q.  tests  do  not  reduce 
bias,  as  is  sometimes  assumed.  The  non-verbal  I.Q.  tests  illustrated  biased 
relationships  in  33  out  of  a  possible  44  predictor-criterion  relationships. 
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Kirkpatrick,  Even,  Barrett  and  Kafczell  (1968)  have  developed  a  useful 
means  of  sunmarizing  data  concerning  the  relationship  between  subgroup 
membership  aird  preuictor  validity.  The  data  from  the  seven  studies  of 
the  present  investigation  have  been  organized  according  to  the  procedure 
of  Kirkpatrick,  et.  al. ,  and  are  presented  in  Table  65.  For  each  sample, 
a  number  of  tests  were  compared  with  a  number  of  criteria;  the  product  of 
these  numbers  is  the  number  of  instances  where  comparisons  of  test  fair¬ 
ness  and  validity  could  be  made,  and  it  is  listed  in  column  1  of  Table  65- 
In  column  2  appears  the  number  of  these  predictor-criterion  comparisons 
in  which  a  significant  mean  difference  between  subgroups  in  either  a  test 
or  a  criterion  was  not  associated  with  a  significant  mean  difference  in 
the  other,  i.e.,  the  number  of  instances  in  which  unfairness,  as  defined 
in  this  report,  occurred.  Column  3  shows  the  number  of  predictor-criterion 
comparisons  where  the  test  was  valid  for  at  least  one  of  the  subgroups.  It 
might  be  noted  that  the  smaller  the  number  in  column  3  is  in  comparison  to 
the  number  in  column  1,  the  less  appropriate  are  the  testa  as  a  whole  for 
predicting  the  job  success  of  any  of  the  subgroups  (Kirkpatrick,  et.  al., 
1968).  Column  U  presents  the  number  of  instances  in  which  the  test  was 
valid  in  one  subgroup  but  not  in  the  other.  The  larger  the  number  in  col¬ 
umn  k  in  comparison  to  the  number  in  column  3,  the  greater  the  evidence 
that  differential  validity  in  population  subgroups  may  exist.  Column  5 
indicates  differential  validity  in  the  sense  of  the  number  of  instances  in 
which  the  validity  coefficient  between  a  given  predictor  and  criterion 
significantly  differs  in  magnitude  for  the  two  subgroups.  It  is  useful  to 
compare  columns  4  and  5  with  column  3>  as  well  as  column  1,  when  attempting 
to  draw  a  conclusion  about  the  relative  frequency  of  differential  validity, 
since  column  1  contains  many  instances  where  the  tests  lacked  validity  in 
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either  subgroup.  Such  instances  may  be  regarded  as  irrelevant  to  the  issue 
of  differential  validity,  as  the  testa  were  apparently  inappropriate  for 
these  situations  (Kirkpatrick,  et.  al. ,  19 6(3). 

In  summary,  within  the  limitations  of  the  data  gathered  and  the  assump¬ 
tions  required,  the  results  of  the  present  study  indicate  that  test  bias 
is  clearly  present  in  a  large  number  of  cases  where  heterogeneous  groups 
are  combined  in  making  predictions.  However,  it  is  erroneous  to  conclude 
that  all  inadvertent  test  bias  denies  opportunities  to  minority  group  mem¬ 
bers.  The  present  study  has  demonstrated  the  need  to  validate  tests  sepa¬ 
rately  for  minority  and  majority  group  members.  The  traditional  validation 
model  which  assumes  homogeneous  populations  is  clearly  inappropriate. 


REFERENCES 


Anastasi,  Anne.  Some  implications  of  cultural  factors  for  test  construction. 

In  Anastasi  (Ed.),  Testing  Problems  in  Perspective.  Washington,  D.C.: 
American  Council  on”Educatio~n,  1965". 

Amrine,  M.  The  196b  congressional  inquiry  into  testing:  a  commentary. 

American  Psychologist,  1969,  20 ,  899-670. 

APA  Task  Force  on  Employment  Testing  of  Minority  Groups.  Job  Testing  and 
the  Disadvantaged.  American  Psychologist,  1909,  24,  637-690. 

Arvey,  R.  D.  Unfair  discrimination  and  tests:  some  issues.  Unpublished 
paper.  University  of  Minnesota,  1967. 

Banas,  P.  A.  Moderator  variables:  a  review  of  the  literature.  U.  S.  Army 
Personnel  Research  Office,  Research  Memorandum  t>9-V,  June  1969* 

Bartlett,  C.  J.  and  O'Leary,  B.  S.  A  differential  prediction  model  to  moderate 
the  effects  of  heterogeneous  groups  in  personnel  selection  and  classifica¬ 
tion.  Personnel  Psychology,  1969,  22,  1-17. 

Boneau,  C.  A.  The  effects  of  violations  of  assumptions  underlying  th_  t  test. 
Psychological  Bulletin,  i960,  jjy,  1*9-64. 

Cleary,  T.  Anne.  Test  Bias:  Validity  of  the  Scholastic  Aptitude  Test  for 
Negro  and  White  Students  In  Integrated  Colleges.  College  Entrance  Exam¬ 
ination  Board,  Research  and  Development  Reports,  RDR-69-6,  No.  18,  Educa¬ 
tional  Testing  Service,  June  i960. 

Equal  Employment  Opportunity  Commission.  Guidelines  on  Employment  Testing 
Procedures.  Washington,  D.  C.:  E.E.O.C"!  190(7! 

Guion,  R.  M.  Personnel  Testing.  New  York:  McGraw-Hill,  19t>9 • 

Guion,  R.  M.  Employment  tests  and  discriminatory  hiring.  Industrial  Relations, 

1966,  20-37.  .  ~  ~ .  " 

Gulliksen,  H.  Theory  of  mental  tests.  New  York:  W’iley,  1990. 

Hays,  W.  L.  Statistics  for  psychologists.  New  York:  Holt,  Rinehart,  and 
Winston,  1963. 

Kirkpatrick,  J.  J.,  Ewen,  R.  B.,  Barrett.  R.  S.,  and  Katzoll,  R.  A.  Differ¬ 
ential  selection  among  applicants  from  different  socioeconomic  or  ethnic 
backgrounds.  Final  report  to  the  Ford  Foundation,  New  York  University, 

May,  l9t>7- 

Kirkpatrick,  J.  J. ,  Ewen,  R.  B.,  Barrett,  R.  S. ,  and  Katzell,  R.  A.  Testing 
and  fair  employment.  New  York:  New  York  University  Press,  U(f'. 

Krug,  R.  E.  Some  suggested  approaches  for  test,  development  and  measurement. 
Personnel  Psychology,  I900,  19,  24-34. 


Lockwood,  H.  C.  Critical  problems  in  achieving  equal  employment  opportunity. 
Personnel  Psychology ,  iy66 ,  iy,  3-9* 

Lopez,  F.  M.  Current  problems  in  test  performance  of  JoL  applicants. 

Personnel  Psychology,  ,  19,  10- ly. 

Manning,  W.  H.  and  DuBois,  P.  H.  Correlational  methods  in  research  on  human 
learning.  Perceptual  and  Motor  Skills,  1962,  IS ,  ?'6'J -321. 

Mitchell,  M.  D.,  Albright,  L.  E.,  and  McMurray,  F.  D.  Biracial  validation 
of  selection  procedures  in  a  large  southern  plant.  Proceedings  of  the 
76th  Annual  Convention  of  the  American  Psychological  Association,  i960, 

Potthoff,  R.  F.  Statistical  aspects  of  the  problem  of  slases  in  psychological 
tests.  Chapel  Hill,  N.C.:  University  of  North  Carolina,  institute,  of 
Statistics  Mimeo  Series  No.  May  1966. 

Honan,  W.  W.  and  Prlen,  K.  P.  Toward  a  criterion  theory:  a  review  and  analysis 
of  research  and  opinion.  Greensboro,  N.C. :  The  Richardson  Foundation, 

June  '196  b  . 

Ruda,  E.  and  Albright,  L.  E.  Racial  differences  on  selection  instruments 
related  to  subsequent  Job  performance.  Personnel  Psychology,  190>  ,  21, 

31-U. 

Saunders,  D.  R.  Moderator  variables  in  prediction.  Educational  and  Psycho¬ 
logical  Measurement ,  1996,  li>,  209-222. 

Tenopyr,  Mary  L.  Race  and  socioeconomic  status  as  moderators  in  predicting 
machine-shop  training  success.  Paper  presented  at  the  annual  meeting  of 
the  American  Psychological  Association,  Washington,  D.C.,  19*>7. 

Wallace,  Phyliss,  Kissinger,  Beverly,  and  Reynolds,  Betty.  Testing  of  minority 
group  applicants  for  employment,  Research  Report  I96O-Y,  Equal  employment 
Opportunity  Commission ,  March  i960. 

Welch,  B.  L.  The  generalization  of  student's  problems  when  several  different 
populai  ion  variances  are  involved.  Biometrika ,  19** Y,  3**>  26-39* 


APPENDIX  A 


FIGURES  ILLUSTRATING  POSSIBLE  El'TECTS  OF  A 
HETEROGENEOUS  APPLICANT  POPULATION  IN  PERSONNEL 
SELECTION  PROCEDURES 
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Fig.  10:  Opposite  Validity,  Difference  on  Predictor  Only 
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This  report  presents  the  findings  of  the  firBt  phase  of  a  research  project  to 
investigate  the  problems  which  exist  regarding  subcultural  differences  in  the  predic¬ 
tion  of  Job  performance.  Phase  I  of  the  project  was  an  attempt  to  obtain  an  adequate 
picture  of  the  effects  of  cultural  factors  on  existing  selection  procedures.  Seven 
independent  studies  were  conducted  in  which  the  validity  of  conmercial  and  industrially 
developed  selection  tests  was  examined  separately  for  white  and  Negrv  subgroups  of 
the  population  using  the  eleven  different  relationships  presented  in  the  Bartlett 
and  O'Leary  (1969)  model.  Occupational  groups  which  were  studied  included  toll 
collectors,  correctional  officers,  toll  facility  officers,  various  clerical  workers, 
and  keypunch  operators.  A  sample  of  inmates  in  a  federal  correctional  institution 
was  also  studied. 

The  results  of  Phase  I  indicated  that  test  bias  is  clearly  present  in  a  large 
number  of  cases  where  heterogeneous  groups  are  combined  in  making  predictions  of  Job 
performance.  However,  it  is  erroneous  to  conclude  that  all  inadvertent  test  bias 
denies  opportunities  to  minority  group  members.  The  present  study  has  demonstrated 
the  need  to  validate  tests  separately  for  minority  and  majority  group  members.  The 
traditional  validation  model  which  assumes  homogeneous  populations  is  clearly  inappro¬ 
priate. 

The  second  phase  of  the  project  will  involve  the  evaluation  of  procedures  to 
control  or  eliminate  bias.  Differential  prediction  models,  culture-equivalent  tests, 
learning  measures,  as  well  as  some  non-cognitive  measures  will  be  examined. 
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